Re: Import csv file on django view

2020-07-25 Thread Ronaldo Mata
Hi  Naresh Jonnala.

Yes, it's work to detect delimiter on csv file, But still I don't know how
to detect what is the current encoding of csv file 樂

I need to know how to implement a good uploading csv file  view on django

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAP%3DoziSPwdGc7UYn_WrJBqLM8sL-BZV5DjtUnu7eumsgcP0jsQ%40mail.gmail.com.


Re: Import csv file on django view

2020-07-25 Thread Naresh Jonnala
Hi,

I am not sure this will help or not, Still i want add a peace of code.

sniffer = csv.Sniffer()
dialect = sniffer.sniff()

dialect.__dict__
mappingproxy({'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n',
'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ',',
'quotechar': '"', 'skipinitialspace': False})


lineterminator = dialect.lineterminator
quoting = dialect.quoting
doublequote = dialect.doublequote
delimiter = dialect.delimiter
quotechar = dialect.quotechar
skipinitialspace = dialect.skipinitialspace


csv.DictReader(self.file_open, **dialect)


Try this.

-
Naresh Jonnala
Hindustan.


On Saturday, July 25, 2020 at 8:03:44 AM UTC+5:30, Liu Zheng wrote:
>
> Yes. You are right. Pandas' default behavior is as following:
>
> encoding = sys.getsystemencoding() or "utf-8"
>
> I tried to open a simple csv encoded into "utf16-LE" (popular on windows), 
> and got the following error:
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: 
> invalid start byte
>
> On Sat, Jul 25, 2020 at 5:13 AM Ronaldo Mata  > wrote:
>
>> Hi Pandas require knows the encoding and delimiter previously when you 
>> use pd.read_csv(filepath, encoding=" ", delimiter=" ") I think that is the 
>> same 樂
>>
>> El vie., 24 de julio de 2020 3:42 p. m., Jani Tiainen > > escribió:
>>
>>> Hi,
>>>
>>> I highly can recommend to use pandas to read csv. It does pretty good 
>>> job to guess a lot of things without extra config. 
>>>
>>> Of course it's one more extra dependency. 
>>>
>>>
>>> pe 24. heinäk. 2020 klo 17.09 Ronaldo Mata >> > kirjoitti:
>>>
 Yes, I will try it. Anythin I will let you know

 El mié., 22 de julio de 2020 12:24 p. m., Liu Zheng <
 firstd...@gmail.com > escribió:

> Hi, 
>
> Are you sure that the file used for detection is the same as the file 
> opened and decoded and gave you incorrect information?
>
> By the way, ascii is a proper subset of utf-8. If chardet said it 
> ascii, decoding it using utf-8 should always work.
>
> If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in 
> chardet? You can try it directly, without mixing it with django’s 
> requests 
> first. Make sure you can detect and decode the file locally in a test 
> program. Then put it into the app.
>
> If you share the file, i’m also glad to help you try it.
>
> On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata  > wrote:
>
>> Hi Kovy, this is not solved. Liu Zheng but using 
>> chardet(request.FILES['file'].read()) return encoding "ascii" is not 
>> correct, I've uploaded a file using utf-7 as encoding for example and 
>> the 
>> result is wrog. and then I tried 
>> request.FILES['file'].read().decode('ascii') and not work return bad 
>> data. 
>> Example for @ string return "+AEA-" string.
>>
>> El mié., 22 jul. 2020 a las 11:16, Kovy Jacob (> >) escribió:
>>
>>> I’m confused. I don’t know if I can help.
>>>
>>> On Jul 22, 2020, at 11:11 AM, Liu Zheng >> > wrote:
>>>
>>> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] 
>>> and the chardet file handler are binary handlers. Binary handler 
>>> presents 
>>> the raw data. chardet takes a sequence or raw data and then detect the 
>>> encoding format. With its prediction, if you want to open that puece of 
>>> data in text mode, you can use the .decode() method of 
>>> bytes object to get a python string.
>>>
>>> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob >> > wrote:
>>>
 That’s probably not the proper answer, but that’s the best I can 
 do. Sorry :-(


 On Jul 22, 2020, at 10:46 AM, Ronaldo Mata >>> > wrote:

 Yes, the problem here is that the files will be loaded by the user, 
 so I don't know what delimiter I will receive. This is not a base 
 command 
 that I am using, it is the logic that I want to incorporate in a view

 El mié., 22 jul. 2020 a las 10:43, Kovy Jacob (>>> >) escribió:

> Ah, so is the problem that you don’t always know what the 
> delimiter is when you read it? If yes, what is the use case for this? 
> You 
> might not need a universal solution, maybe just put all the info into 
> a csv 
> yourself, manually.
>
> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata  > wrote:
>
> Hi Kovy, I'm using csv module, but I need to handle the delimiters 
> of the files, sometimes you come separated by "," others by ";" and 
> rarely 
> by "|" 
>
> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ( >) escribió:
>
>> Could you just use the standard python csv module?
>>
>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata > > wrote:

Re: Import csv file on django view

2020-07-24 Thread Liu Zheng
Yes. You are right. Pandas' default behavior is as following:

encoding = sys.getsystemencoding() or "utf-8"

I tried to open a simple csv encoded into "utf16-LE" (popular on windows),
and got the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0:
invalid start byte

On Sat, Jul 25, 2020 at 5:13 AM Ronaldo Mata 
wrote:

> Hi Pandas require knows the encoding and delimiter previously when you use
> pd.read_csv(filepath, encoding=" ", delimiter=" ") I think that is the same
> 樂
>
> El vie., 24 de julio de 2020 3:42 p. m., Jani Tiainen 
> escribió:
>
>> Hi,
>>
>> I highly can recommend to use pandas to read csv. It does pretty good job
>> to guess a lot of things without extra config.
>>
>> Of course it's one more extra dependency.
>>
>>
>> pe 24. heinäk. 2020 klo 17.09 Ronaldo Mata 
>> kirjoitti:
>>
>>> Yes, I will try it. Anythin I will let you know
>>>
>>> El mié., 22 de julio de 2020 12:24 p. m., Liu Zheng <
>>> firstday2...@gmail.com> escribió:
>>>
 Hi,

 Are you sure that the file used for detection is the same as the file
 opened and decoded and gave you incorrect information?

 By the way, ascii is a proper subset of utf-8. If chardet said it
 ascii, decoding it using utf-8 should always work.

 If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in
 chardet? You can try it directly, without mixing it with django’s requests
 first. Make sure you can detect and decode the file locally in a test
 program. Then put it into the app.

 If you share the file, i’m also glad to help you try it.

 On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata 
 wrote:

> Hi Kovy, this is not solved. Liu Zheng but using
> chardet(request.FILES['file'].read()) return encoding "ascii" is not
> correct, I've uploaded a file using utf-7 as encoding for example and the
> result is wrog. and then I tried
> request.FILES['file'].read().decode('ascii') and not work return bad data.
> Example for @ string return "+AEA-" string.
>
> El mié., 22 jul. 2020 a las 11:16, Kovy Jacob ()
> escribió:
>
>> I’m confused. I don’t know if I can help.
>>
>> On Jul 22, 2020, at 11:11 AM, Liu Zheng 
>> wrote:
>>
>> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’]
>> and the chardet file handler are binary handlers. Binary handler presents
>> the raw data. chardet takes a sequence or raw data and then detect the
>> encoding format. With its prediction, if you want to open that puece of
>> data in text mode, you can use the .decode() method of
>> bytes object to get a python string.
>>
>> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob 
>> wrote:
>>
>>> That’s probably not the proper answer, but that’s the best I can do.
>>> Sorry :-(
>>>
>>>
>>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Yes, the problem here is that the files will be loaded by the user,
>>> so I don't know what delimiter I will receive. This is not a base 
>>> command
>>> that I am using, it is the logic that I want to incorporate in a view
>>>
>>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
>>> escribió:
>>>
 Ah, so is the problem that you don’t always know what the delimiter
 is when you read it? If yes, what is the use case for this? You might 
 not
 need a universal solution, maybe just put all the info into a csv 
 yourself,
 manually.

 On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
 wrote:

 Hi Kovy, I'm using csv module, but I need to handle the delimiters
 of the files, sometimes you come separated by "," others by ";" and 
 rarely
 by "|"

 El mié., 22 jul. 2020 a las 10:28, Kovy Jacob (<
 kovy.ja...@gmail.com>) escribió:

> Could you just use the standard python csv module?
>
> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata <
> ronaldomat...@gmail.com> wrote:
>
> Hi Liu thank for your answer.
>
> This has been a headache, I am trying to read the file using
> csv.DictReader initially i had an error trying to get the dict keys 
> when
> iterating by rows, and i thought it could be encoding (for this 
> reason i
> wanted to prepare the view to use the correct encoding). for that 
> reason I
> asked my question.
>
> 1) your first approach doesn't work, if i send utf-8 file, chardet
> returns ascii as encoding. it seems request.FILES ['file']. read () 
> returns
> a binary with that encoding.
>
> 2) In the end I realized that the problem was the delimiter of the
> csv but predicting it is another problem.
>
> Anyway, it was a task 

Re: Import csv file on django view

2020-07-24 Thread Ronaldo Mata
Hi Pandas require knows the encoding and delimiter previously when you use
pd.read_csv(filepath, encoding=" ", delimiter=" ") I think that is the same
樂

El vie., 24 de julio de 2020 3:42 p. m., Jani Tiainen 
escribió:

> Hi,
>
> I highly can recommend to use pandas to read csv. It does pretty good job
> to guess a lot of things without extra config.
>
> Of course it's one more extra dependency.
>
>
> pe 24. heinäk. 2020 klo 17.09 Ronaldo Mata 
> kirjoitti:
>
>> Yes, I will try it. Anythin I will let you know
>>
>> El mié., 22 de julio de 2020 12:24 p. m., Liu Zheng <
>> firstday2...@gmail.com> escribió:
>>
>>> Hi,
>>>
>>> Are you sure that the file used for detection is the same as the file
>>> opened and decoded and gave you incorrect information?
>>>
>>> By the way, ascii is a proper subset of utf-8. If chardet said it ascii,
>>> decoding it using utf-8 should always work.
>>>
>>> If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in
>>> chardet? You can try it directly, without mixing it with django’s requests
>>> first. Make sure you can detect and decode the file locally in a test
>>> program. Then put it into the app.
>>>
>>> If you share the file, i’m also glad to help you try it.
>>>
>>> On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata 
>>> wrote:
>>>
 Hi Kovy, this is not solved. Liu Zheng but using
 chardet(request.FILES['file'].read()) return encoding "ascii" is not
 correct, I've uploaded a file using utf-7 as encoding for example and the
 result is wrog. and then I tried
 request.FILES['file'].read().decode('ascii') and not work return bad data.
 Example for @ string return "+AEA-" string.

 El mié., 22 jul. 2020 a las 11:16, Kovy Jacob ()
 escribió:

> I’m confused. I don’t know if I can help.
>
> On Jul 22, 2020, at 11:11 AM, Liu Zheng 
> wrote:
>
> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’]
> and the chardet file handler are binary handlers. Binary handler presents
> the raw data. chardet takes a sequence or raw data and then detect the
> encoding format. With its prediction, if you want to open that puece of
> data in text mode, you can use the .decode() method of
> bytes object to get a python string.
>
> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob 
> wrote:
>
>> That’s probably not the proper answer, but that’s the best I can do.
>> Sorry :-(
>>
>>
>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
>> wrote:
>>
>> Yes, the problem here is that the files will be loaded by the user,
>> so I don't know what delimiter I will receive. This is not a base command
>> that I am using, it is the logic that I want to incorporate in a view
>>
>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
>> escribió:
>>
>>> Ah, so is the problem that you don’t always know what the delimiter
>>> is when you read it? If yes, what is the use case for this? You might 
>>> not
>>> need a universal solution, maybe just put all the info into a csv 
>>> yourself,
>>> manually.
>>>
>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters
>>> of the files, sometimes you come separated by "," others by ";" and 
>>> rarely
>>> by "|"
>>>
>>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
>>> escribió:
>>>
 Could you just use the standard python csv module?

 On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
 wrote:

 Hi Liu thank for your answer.

 This has been a headache, I am trying to read the file using
 csv.DictReader initially i had an error trying to get the dict keys 
 when
 iterating by rows, and i thought it could be encoding (for this reason 
 i
 wanted to prepare the view to use the correct encoding). for that 
 reason I
 asked my question.

 1) your first approach doesn't work, if i send utf-8 file, chardet
 returns ascii as encoding. it seems request.FILES ['file']. read () 
 returns
 a binary with that encoding.

 2) In the end I realized that the problem was the delimiter of the
 csv but predicting it is another problem.

 Anyway, it was a task that I had to do and that was my
 limitation. I think there must be a library that does all this, 
 uploading a
 csv file is common practice in many web apps.

 El mar., 21 jul. 2020 a las 13:47, Liu Zheng (<
 firstday2...@gmail.com>) escribió:

> Hi. First of all, I think it's impossible to perfectly detect
> encoding without further information. See the answer in this SO post:
> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text

Re: Import csv file on django view

2020-07-24 Thread Jani Tiainen
Hi,

I highly can recommend to use pandas to read csv. It does pretty good job
to guess a lot of things without extra config.

Of course it's one more extra dependency.


pe 24. heinäk. 2020 klo 17.09 Ronaldo Mata 
kirjoitti:

> Yes, I will try it. Anythin I will let you know
>
> El mié., 22 de julio de 2020 12:24 p. m., Liu Zheng <
> firstday2...@gmail.com> escribió:
>
>> Hi,
>>
>> Are you sure that the file used for detection is the same as the file
>> opened and decoded and gave you incorrect information?
>>
>> By the way, ascii is a proper subset of utf-8. If chardet said it ascii,
>> decoding it using utf-8 should always work.
>>
>> If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in chardet?
>> You can try it directly, without mixing it with django’s requests first.
>> Make sure you can detect and decode the file locally in a test program.
>> Then put it into the app.
>>
>> If you share the file, i’m also glad to help you try it.
>>
>> On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata 
>> wrote:
>>
>>> Hi Kovy, this is not solved. Liu Zheng but using
>>> chardet(request.FILES['file'].read()) return encoding "ascii" is not
>>> correct, I've uploaded a file using utf-7 as encoding for example and the
>>> result is wrog. and then I tried
>>> request.FILES['file'].read().decode('ascii') and not work return bad data.
>>> Example for @ string return "+AEA-" string.
>>>
>>> El mié., 22 jul. 2020 a las 11:16, Kovy Jacob ()
>>> escribió:
>>>
 I’m confused. I don’t know if I can help.

 On Jul 22, 2020, at 11:11 AM, Liu Zheng  wrote:

 Hi, glad you solved the problem. Yes, both the request.FILES[‘file’]
 and the chardet file handler are binary handlers. Binary handler presents
 the raw data. chardet takes a sequence or raw data and then detect the
 encoding format. With its prediction, if you want to open that puece of
 data in text mode, you can use the .decode() method of
 bytes object to get a python string.

 On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob 
 wrote:

> That’s probably not the proper answer, but that’s the best I can do.
> Sorry :-(
>
>
> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
> wrote:
>
> Yes, the problem here is that the files will be loaded by the user, so
> I don't know what delimiter I will receive. This is not a base command 
> that
> I am using, it is the logic that I want to incorporate in a view
>
> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
> escribió:
>
>> Ah, so is the problem that you don’t always know what the delimiter
>> is when you read it? If yes, what is the use case for this? You might not
>> need a universal solution, maybe just put all the info into a csv 
>> yourself,
>> manually.
>>
>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
>> wrote:
>>
>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of
>> the files, sometimes you come separated by "," others by ";" and rarely 
>> by
>> "|"
>>
>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
>> escribió:
>>
>>> Could you just use the standard python csv module?
>>>
>>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Hi Liu thank for your answer.
>>>
>>> This has been a headache, I am trying to read the file using
>>> csv.DictReader initially i had an error trying to get the dict keys when
>>> iterating by rows, and i thought it could be encoding (for this reason i
>>> wanted to prepare the view to use the correct encoding). for that 
>>> reason I
>>> asked my question.
>>>
>>> 1) your first approach doesn't work, if i send utf-8 file, chardet
>>> returns ascii as encoding. it seems request.FILES ['file']. read () 
>>> returns
>>> a binary with that encoding.
>>>
>>> 2) In the end I realized that the problem was the delimiter of the
>>> csv but predicting it is another problem.
>>>
>>> Anyway, it was a task that I had to do and that was my limitation. I
>>> think there must be a library that does all this, uploading a csv file 
>>> is
>>> common practice in many web apps.
>>>
>>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng (<
>>> firstday2...@gmail.com>) escribió:
>>>
 Hi. First of all, I think it's impossible to perfectly detect
 encoding without further information. See the answer in this SO post:
 https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
  There
 are many packages and tools to help detect encoding format, but keep in
 mind that they are only giving educated guesses. (Most of the time, the
 guess is correct, but do check the dev page to see whether there are 
 known
 issues related to your problem.)

 Now let's say you have decided to use chardet. 

Re: Import csv file on django view

2020-07-24 Thread Ronaldo Mata
Yes, I will try it. Anythin I will let you know

El mié., 22 de julio de 2020 12:24 p. m., Liu Zheng 
escribió:

> Hi,
>
> Are you sure that the file used for detection is the same as the file
> opened and decoded and gave you incorrect information?
>
> By the way, ascii is a proper subset of utf-8. If chardet said it ascii,
> decoding it using utf-8 should always work.
>
> If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in chardet?
> You can try it directly, without mixing it with django’s requests first.
> Make sure you can detect and decode the file locally in a test program.
> Then put it into the app.
>
> If you share the file, i’m also glad to help you try it.
>
> On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata 
> wrote:
>
>> Hi Kovy, this is not solved. Liu Zheng but using
>> chardet(request.FILES['file'].read()) return encoding "ascii" is not
>> correct, I've uploaded a file using utf-7 as encoding for example and the
>> result is wrog. and then I tried
>> request.FILES['file'].read().decode('ascii') and not work return bad data.
>> Example for @ string return "+AEA-" string.
>>
>> El mié., 22 jul. 2020 a las 11:16, Kovy Jacob ()
>> escribió:
>>
>>> I’m confused. I don’t know if I can help.
>>>
>>> On Jul 22, 2020, at 11:11 AM, Liu Zheng  wrote:
>>>
>>> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and
>>> the chardet file handler are binary handlers. Binary handler presents the
>>> raw data. chardet takes a sequence or raw data and then detect the encoding
>>> format. With its prediction, if you want to open that puece of data in text
>>> mode, you can use the .decode() method of bytes object to
>>> get a python string.
>>>
>>> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob 
>>> wrote:
>>>
 That’s probably not the proper answer, but that’s the best I can do.
 Sorry :-(


 On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
 wrote:

 Yes, the problem here is that the files will be loaded by the user, so
 I don't know what delimiter I will receive. This is not a base command that
 I am using, it is the logic that I want to incorporate in a view

 El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
 escribió:

> Ah, so is the problem that you don’t always know what the delimiter is
> when you read it? If yes, what is the use case for this? You might not 
> need
> a universal solution, maybe just put all the info into a csv yourself,
> manually.
>
> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
> wrote:
>
> Hi Kovy, I'm using csv module, but I need to handle the delimiters of
> the files, sometimes you come separated by "," others by ";" and rarely by
> "|"
>
> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
> escribió:
>
>> Could you just use the standard python csv module?
>>
>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
>> wrote:
>>
>> Hi Liu thank for your answer.
>>
>> This has been a headache, I am trying to read the file using
>> csv.DictReader initially i had an error trying to get the dict keys when
>> iterating by rows, and i thought it could be encoding (for this reason i
>> wanted to prepare the view to use the correct encoding). for that reason 
>> I
>> asked my question.
>>
>> 1) your first approach doesn't work, if i send utf-8 file, chardet
>> returns ascii as encoding. it seems request.FILES ['file']. read () 
>> returns
>> a binary with that encoding.
>>
>> 2) In the end I realized that the problem was the delimiter of the
>> csv but predicting it is another problem.
>>
>> Anyway, it was a task that I had to do and that was my limitation. I
>> think there must be a library that does all this, uploading a csv file is
>> common practice in many web apps.
>>
>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
>> escribió:
>>
>>> Hi. First of all, I think it's impossible to perfectly detect
>>> encoding without further information. See the answer in this SO post:
>>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>>  There
>>> are many packages and tools to help detect encoding format, but keep in
>>> mind that they are only giving educated guesses. (Most of the time, the
>>> guess is correct, but do check the dev page to see whether there are 
>>> known
>>> issues related to your problem.)
>>>
>>> Now let's say you have decided to use chardet. Check its doc page
>>> for the usage:
>>> https://chardet.readthedocs.io/en/latest/usage.html#usage You'll
>>> have more than one solutions. Here are some examples:
>>>
>>> 1. If the files uploaded to your server are all expected to be small
>>> csv files (less than a few MB and not many users do it concurrently), 
>>> you
>>> can do the following:
>>>
>>> #in the view to handle 

Re: Import csv file on django view

2020-07-22 Thread Liu Zheng
Hi,

Are you sure that the file used for detection is the same as the file
opened and decoded and gave you incorrect information?

By the way, ascii is a proper subset of utf-8. If chardet said it ascii,
decoding it using utf-8 should always work.

If your file contains non-ascii UTF-8 bytes, maybe it’s a bug in chardet?
You can try it directly, without mixing it with django’s requests first.
Make sure you can detect and decode the file locally in a test program.
Then put it into the app.

If you share the file, i’m also glad to help you try it.

On Thu, 23 Jul 2020 at 12:04 AM, Ronaldo Mata 
wrote:

> Hi Kovy, this is not solved. Liu Zheng but using
> chardet(request.FILES['file'].read()) return encoding "ascii" is not
> correct, I've uploaded a file using utf-7 as encoding for example and the
> result is wrog. and then I tried
> request.FILES['file'].read().decode('ascii') and not work return bad data.
> Example for @ string return "+AEA-" string.
>
> El mié., 22 jul. 2020 a las 11:16, Kovy Jacob ()
> escribió:
>
>> I’m confused. I don’t know if I can help.
>>
>> On Jul 22, 2020, at 11:11 AM, Liu Zheng  wrote:
>>
>> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and
>> the chardet file handler are binary handlers. Binary handler presents the
>> raw data. chardet takes a sequence or raw data and then detect the encoding
>> format. With its prediction, if you want to open that puece of data in text
>> mode, you can use the .decode() method of bytes object to
>> get a python string.
>>
>> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob  wrote:
>>
>>> That’s probably not the proper answer, but that’s the best I can do.
>>> Sorry :-(
>>>
>>>
>>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Yes, the problem here is that the files will be loaded by the user, so I
>>> don't know what delimiter I will receive. This is not a base command that I
>>> am using, it is the logic that I want to incorporate in a view
>>>
>>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
>>> escribió:
>>>
 Ah, so is the problem that you don’t always know what the delimiter is
 when you read it? If yes, what is the use case for this? You might not need
 a universal solution, maybe just put all the info into a csv yourself,
 manually.

 On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
 wrote:

 Hi Kovy, I'm using csv module, but I need to handle the delimiters of
 the files, sometimes you come separated by "," others by ";" and rarely by
 "|"

 El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
 escribió:

> Could you just use the standard python csv module?
>
> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
> wrote:
>
> Hi Liu thank for your answer.
>
> This has been a headache, I am trying to read the file using
> csv.DictReader initially i had an error trying to get the dict keys when
> iterating by rows, and i thought it could be encoding (for this reason i
> wanted to prepare the view to use the correct encoding). for that reason I
> asked my question.
>
> 1) your first approach doesn't work, if i send utf-8 file, chardet
> returns ascii as encoding. it seems request.FILES ['file']. read () 
> returns
> a binary with that encoding.
>
> 2) In the end I realized that the problem was the delimiter of the csv
> but predicting it is another problem.
>
> Anyway, it was a task that I had to do and that was my limitation. I
> think there must be a library that does all this, uploading a csv file is
> common practice in many web apps.
>
> El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
> escribió:
>
>> Hi. First of all, I think it's impossible to perfectly detect
>> encoding without further information. See the answer in this SO post:
>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>  There
>> are many packages and tools to help detect encoding format, but keep in
>> mind that they are only giving educated guesses. (Most of the time, the
>> guess is correct, but do check the dev page to see whether there are 
>> known
>> issues related to your problem.)
>>
>> Now let's say you have decided to use chardet. Check its doc page for
>> the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
>> You'll
>> have more than one solutions. Here are some examples:
>>
>> 1. If the files uploaded to your server are all expected to be small
>> csv files (less than a few MB and not many users do it concurrently), you
>> can do the following:
>>
>> #in the view to handle the uploaded file: (assume file input name is
>> just "file")
>> file_content = request.FILES['file'].read()
>> chardet.detect(file_content)
>>
>> 2. Also, chardet seems to support incremental (line-by-line)
>> detection
>> 

Re: Import csv file on django view

2020-07-22 Thread Ronaldo Mata
Hi Kovy, this is not solved. Liu Zheng but using
chardet(request.FILES['file'].read()) return encoding "ascii" is not
correct, I've uploaded a file using utf-7 as encoding for example and the
result is wrog. and then I tried
request.FILES['file'].read().decode('ascii') and not work return bad data.
Example for @ string return "+AEA-" string.

El mié., 22 jul. 2020 a las 11:16, Kovy Jacob ()
escribió:

> I’m confused. I don’t know if I can help.
>
> On Jul 22, 2020, at 11:11 AM, Liu Zheng  wrote:
>
> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and
> the chardet file handler are binary handlers. Binary handler presents the
> raw data. chardet takes a sequence or raw data and then detect the encoding
> format. With its prediction, if you want to open that puece of data in text
> mode, you can use the .decode() method of bytes object to
> get a python string.
>
> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob  wrote:
>
>> That’s probably not the proper answer, but that’s the best I can do.
>> Sorry :-(
>>
>>
>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
>> wrote:
>>
>> Yes, the problem here is that the files will be loaded by the user, so I
>> don't know what delimiter I will receive. This is not a base command that I
>> am using, it is the logic that I want to incorporate in a view
>>
>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
>> escribió:
>>
>>> Ah, so is the problem that you don’t always know what the delimiter is
>>> when you read it? If yes, what is the use case for this? You might not need
>>> a universal solution, maybe just put all the info into a csv yourself,
>>> manually.
>>>
>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of
>>> the files, sometimes you come separated by "," others by ";" and rarely by
>>> "|"
>>>
>>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
>>> escribió:
>>>
 Could you just use the standard python csv module?

 On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
 wrote:

 Hi Liu thank for your answer.

 This has been a headache, I am trying to read the file using
 csv.DictReader initially i had an error trying to get the dict keys when
 iterating by rows, and i thought it could be encoding (for this reason i
 wanted to prepare the view to use the correct encoding). for that reason I
 asked my question.

 1) your first approach doesn't work, if i send utf-8 file, chardet
 returns ascii as encoding. it seems request.FILES ['file']. read () returns
 a binary with that encoding.

 2) In the end I realized that the problem was the delimiter of the csv
 but predicting it is another problem.

 Anyway, it was a task that I had to do and that was my limitation. I
 think there must be a library that does all this, uploading a csv file is
 common practice in many web apps.

 El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
 escribió:

> Hi. First of all, I think it's impossible to perfectly detect encoding
> without further information. See the answer in this SO post:
> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>  There
> are many packages and tools to help detect encoding format, but keep in
> mind that they are only giving educated guesses. (Most of the time, the
> guess is correct, but do check the dev page to see whether there are known
> issues related to your problem.)
>
> Now let's say you have decided to use chardet. Check its doc page for
> the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
> You'll
> have more than one solutions. Here are some examples:
>
> 1. If the files uploaded to your server are all expected to be small
> csv files (less than a few MB and not many users do it concurrently), you
> can do the following:
>
> #in the view to handle the uploaded file: (assume file input name is
> just "file")
> file_content = request.FILES['file'].read()
> chardet.detect(file_content)
>
> 2. Also, chardet seems to support incremental (line-by-line) detection
> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>
> Given this, we can also read from requests.FILES line by line and pass
> each line to chardet
>
> from chardet.universaldetector import UniversalDetector
>
> #somewhere in a view function
> detector = UniversalDetector()
> file_handle = request.FILES['file']
> for line in file_handle:
> detector.feed(line)
> if detector.done: break
> detector.close()
> # result available as a dict at detector.result
>
>
>
>
>
> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>>
>> How to deal with encoding when you try to read a csv file on view.
>>

Re: Import csv file on django view

2020-07-22 Thread Liu Zheng
What i meant was that you can only feed binary data or binary handlers to
chardet. You can decode the binary data according to the detection results
afterward.

On Wed, 22 Jul 2020 at 11:11 PM, Liu Zheng  wrote:

> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and
> the chardet file handler are binary handlers. Binary handler presents the
> raw data. chardet takes a sequence or raw data and then detect the encoding
> format. With its prediction, if you want to open that puece of data in text
> mode, you can use the .decode() method of bytes object to
> get a python string.
>
> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob  wrote:
>
>> That’s probably not the proper answer, but that’s the best I can do.
>> Sorry :-(
>>
>>
>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
>> wrote:
>>
>> Yes, the problem here is that the files will be loaded by the user, so I
>> don't know what delimiter I will receive. This is not a base command that I
>> am using, it is the logic that I want to incorporate in a view
>>
>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
>> escribió:
>>
>>> Ah, so is the problem that you don’t always know what the delimiter is
>>> when you read it? If yes, what is the use case for this? You might not need
>>> a universal solution, maybe just put all the info into a csv yourself,
>>> manually.
>>>
>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of
>>> the files, sometimes you come separated by "," others by ";" and rarely by
>>> "|"
>>>
>>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
>>> escribió:
>>>
 Could you just use the standard python csv module?

 On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
 wrote:

 Hi Liu thank for your answer.

 This has been a headache, I am trying to read the file using
 csv.DictReader initially i had an error trying to get the dict keys when
 iterating by rows, and i thought it could be encoding (for this reason i
 wanted to prepare the view to use the correct encoding). for that reason I
 asked my question.

 1) your first approach doesn't work, if i send utf-8 file, chardet
 returns ascii as encoding. it seems request.FILES ['file']. read () returns
 a binary with that encoding.

 2) In the end I realized that the problem was the delimiter of the csv
 but predicting it is another problem.

 Anyway, it was a task that I had to do and that was my limitation. I
 think there must be a library that does all this, uploading a csv file is
 common practice in many web apps.

 El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
 escribió:

> Hi. First of all, I think it's impossible to perfectly detect encoding
> without further information. See the answer in this SO post:
> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>  There
> are many packages and tools to help detect encoding format, but keep in
> mind that they are only giving educated guesses. (Most of the time, the
> guess is correct, but do check the dev page to see whether there are known
> issues related to your problem.)
>
> Now let's say you have decided to use chardet. Check its doc page for
> the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
> You'll
> have more than one solutions. Here are some examples:
>
> 1. If the files uploaded to your server are all expected to be small
> csv files (less than a few MB and not many users do it concurrently), you
> can do the following:
>
> #in the view to handle the uploaded file: (assume file input name is
> just "file")
> file_content = request.FILES['file'].read()
> chardet.detect(file_content)
>
> 2. Also, chardet seems to support incremental (line-by-line) detection
> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>
> Given this, we can also read from requests.FILES line by line and pass
> each line to chardet
>
> from chardet.universaldetector import UniversalDetector
>
> #somewhere in a view function
> detector = UniversalDetector()
> file_handle = request.FILES['file']
> for line in file_handle:
> detector.feed(line)
> if detector.done: break
> detector.close()
> # result available as a dict at detector.result
>
>
>
>
>
> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>>
>> How to deal with encoding when you try to read a csv file on view.
>>
>> I have a view to upload csv file, in this view I read file and save
>> each row as new record.
>>
>> My bug is when I try to upload a csv file with a differente encoding
>> (not UTF-8)
>>
>> how to handle this on django (using request.FILES) I was researching

Re: Import csv file on django view

2020-07-22 Thread Liu Zheng
Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and
the chardet file handler are binary handlers. Binary handler presents the
raw data. chardet takes a sequence or raw data and then detect the encoding
format. With its prediction, if you want to open that puece of data in text
mode, you can use the .decode() method of bytes object to
get a python string.

On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob  wrote:

> That’s probably not the proper answer, but that’s the best I can do. Sorry
> :-(
>
>
> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata 
> wrote:
>
> Yes, the problem here is that the files will be loaded by the user, so I
> don't know what delimiter I will receive. This is not a base command that I
> am using, it is the logic that I want to incorporate in a view
>
> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
> escribió:
>
>> Ah, so is the problem that you don’t always know what the delimiter is
>> when you read it? If yes, what is the use case for this? You might not need
>> a universal solution, maybe just put all the info into a csv yourself,
>> manually.
>>
>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
>> wrote:
>>
>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the
>> files, sometimes you come separated by "," others by ";" and rarely by "|"
>>
>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
>> escribió:
>>
>>> Could you just use the standard python csv module?
>>>
>>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
>>> wrote:
>>>
>>> Hi Liu thank for your answer.
>>>
>>> This has been a headache, I am trying to read the file using
>>> csv.DictReader initially i had an error trying to get the dict keys when
>>> iterating by rows, and i thought it could be encoding (for this reason i
>>> wanted to prepare the view to use the correct encoding). for that reason I
>>> asked my question.
>>>
>>> 1) your first approach doesn't work, if i send utf-8 file, chardet
>>> returns ascii as encoding. it seems request.FILES ['file']. read () returns
>>> a binary with that encoding.
>>>
>>> 2) In the end I realized that the problem was the delimiter of the csv
>>> but predicting it is another problem.
>>>
>>> Anyway, it was a task that I had to do and that was my limitation. I
>>> think there must be a library that does all this, uploading a csv file is
>>> common practice in many web apps.
>>>
>>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
>>> escribió:
>>>
 Hi. First of all, I think it's impossible to perfectly detect encoding
 without further information. See the answer in this SO post:
 https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
  There
 are many packages and tools to help detect encoding format, but keep in
 mind that they are only giving educated guesses. (Most of the time, the
 guess is correct, but do check the dev page to see whether there are known
 issues related to your problem.)

 Now let's say you have decided to use chardet. Check its doc page for
 the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll
 have more than one solutions. Here are some examples:

 1. If the files uploaded to your server are all expected to be small
 csv files (less than a few MB and not many users do it concurrently), you
 can do the following:

 #in the view to handle the uploaded file: (assume file input name is
 just "file")
 file_content = request.FILES['file'].read()
 chardet.detect(file_content)

 2. Also, chardet seems to support incremental (line-by-line) detection
 https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally

 Given this, we can also read from requests.FILES line by line and pass
 each line to chardet

 from chardet.universaldetector import UniversalDetector

 #somewhere in a view function
 detector = UniversalDetector()
 file_handle = request.FILES['file']
 for line in file_handle:
 detector.feed(line)
 if detector.done: break
 detector.close()
 # result available as a dict at detector.result





 On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>
> How to deal with encoding when you try to read a csv file on view.
>
> I have a view to upload csv file, in this view I read file and save
> each row as new record.
>
> My bug is when I try to upload a csv file with a differente encoding
> (not UTF-8)
>
> how to handle this on django (using request.FILES) I was researching
> and I found chardet but I don't know how to pass it a request.FILES. I 
> need
> help please.
>

 --
 You received this message because you are subscribed to the Google
 Groups "Django users" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to 

Re: Import csv file on django view

2020-07-22 Thread Kovy Jacob
I’m confused. I don’t know if I can help.

> On Jul 22, 2020, at 11:11 AM, Liu Zheng  wrote:
> 
> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and the 
> chardet file handler are binary handlers. Binary handler presents the raw 
> data. chardet takes a sequence or raw data and then detect the encoding 
> format. With its prediction, if you want to open that puece of data in text 
> mode, you can use the .decode() method of bytes object to 
> get a python string.
> 
> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob  > wrote:
> That’s probably not the proper answer, but that’s the best I can do. Sorry :-(
> 
> 
>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata > > wrote:
>> 
>> Yes, the problem here is that the files will be loaded by the user, so I 
>> don't know what delimiter I will receive. This is not a base command that I 
>> am using, it is the logic that I want to incorporate in a view
>> 
>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob (> >) escribió:
>> Ah, so is the problem that you don’t always know what the delimiter is when 
>> you read it? If yes, what is the use case for this? You might not need a 
>> universal solution, maybe just put all the info into a csv yourself, 
>> manually.
>> 
>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata >> > wrote:
>>> 
>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the 
>>> files, sometimes you come separated by "," others by ";" and rarely by "|" 
>>> 
>>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob (>> >) escribió:
>>> Could you just use the standard python csv module?
>>> 
 On Jul 22, 2020, at 10:25 AM, Ronaldo Mata >>> > wrote:
 
 Hi Liu thank for your answer.
 
 This has been a headache, I am trying to read the file using 
 csv.DictReader initially i had an error trying to get the dict keys when 
 iterating by rows, and i thought it could be encoding (for this reason i 
 wanted to prepare the view to use the correct encoding). for that reason I 
 asked my question.
 
 1) your first approach doesn't work, if i send utf-8 file, chardet returns 
 ascii as encoding. it seems request.FILES ['file']. read () returns a 
 binary with that encoding.
 
 2) In the end I realized that the problem was the delimiter of the csv but 
 predicting it is another problem.
 
 Anyway, it was a task that I had to do and that was my limitation. I think 
 there must be a library that does all this, uploading a csv file is common 
 practice in many web apps.
 
 El mar., 21 jul. 2020 a las 13:47, Liu Zheng (>>> >) escribió:
 Hi. First of all, I think it's impossible to perfectly detect encoding 
 without further information. See the answer in this SO post: 
 https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
  
 
  There are many packages and tools to help detect encoding format, but 
 keep in mind that they are only giving educated guesses. (Most of the 
 time, the guess is correct, but do check the dev page to see whether there 
 are known issues related to your problem.)
 
 Now let's say you have decided to use chardet. Check its doc page for the 
 usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
  You'll have 
 more than one solutions. Here are some examples:
 
 1. If the files uploaded to your server are all expected to be small csv 
 files (less than a few MB and not many users do it concurrently), you can 
 do the following:
 
 #in the view to handle the uploaded file: (assume file input name is just 
 "file")
 file_content = request.FILES['file'].read()
 chardet.detect(file_content)
 
 2. Also, chardet seems to support incremental (line-by-line) detection 
 https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
  
 
 
 Given this, we can also read from requests.FILES line by line and pass 
 each line to chardet
 
 from chardet.universaldetector import UniversalDetector
 
 #somewhere in a view function
 detector = UniversalDetector()
 file_handle = request.FILES['file']
 for line in file_handle:
 detector.feed(line)
 if detector.done: break
 detector.close()
 # result available as a dict at detector.result
 
 
 
 
 
 On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
 How to deal with encoding when you 

Re: Import csv file on django view

2020-07-22 Thread Kovy Jacob
Cool! I’m so happy I was able to help you!! Good luck!

> On Jul 22, 2020, at 11:11 AM, Liu Zheng  wrote:
> 
> Hi, glad you solved the problem. Yes, both the request.FILES[‘file’] and the 
> chardet file handler are binary handlers. Binary handler presents the raw 
> data. chardet takes a sequence or raw data and then detect the encoding 
> format. With its prediction, if you want to open that puece of data in text 
> mode, you can use the .decode() method of bytes object to 
> get a python string.
> 
> On Wed, 22 Jul 2020 at 11:04 PM, Kovy Jacob  > wrote:
> That’s probably not the proper answer, but that’s the best I can do. Sorry :-(
> 
> 
>> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata > > wrote:
>> 
>> Yes, the problem here is that the files will be loaded by the user, so I 
>> don't know what delimiter I will receive. This is not a base command that I 
>> am using, it is the logic that I want to incorporate in a view
>> 
>> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob (> >) escribió:
>> Ah, so is the problem that you don’t always know what the delimiter is when 
>> you read it? If yes, what is the use case for this? You might not need a 
>> universal solution, maybe just put all the info into a csv yourself, 
>> manually.
>> 
>>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata >> > wrote:
>>> 
>>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the 
>>> files, sometimes you come separated by "," others by ";" and rarely by "|" 
>>> 
>>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob (>> >) escribió:
>>> Could you just use the standard python csv module?
>>> 
 On Jul 22, 2020, at 10:25 AM, Ronaldo Mata >>> > wrote:
 
 Hi Liu thank for your answer.
 
 This has been a headache, I am trying to read the file using 
 csv.DictReader initially i had an error trying to get the dict keys when 
 iterating by rows, and i thought it could be encoding (for this reason i 
 wanted to prepare the view to use the correct encoding). for that reason I 
 asked my question.
 
 1) your first approach doesn't work, if i send utf-8 file, chardet returns 
 ascii as encoding. it seems request.FILES ['file']. read () returns a 
 binary with that encoding.
 
 2) In the end I realized that the problem was the delimiter of the csv but 
 predicting it is another problem.
 
 Anyway, it was a task that I had to do and that was my limitation. I think 
 there must be a library that does all this, uploading a csv file is common 
 practice in many web apps.
 
 El mar., 21 jul. 2020 a las 13:47, Liu Zheng (>>> >) escribió:
 Hi. First of all, I think it's impossible to perfectly detect encoding 
 without further information. See the answer in this SO post: 
 https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
  
 
  There are many packages and tools to help detect encoding format, but 
 keep in mind that they are only giving educated guesses. (Most of the 
 time, the guess is correct, but do check the dev page to see whether there 
 are known issues related to your problem.)
 
 Now let's say you have decided to use chardet. Check its doc page for the 
 usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
  You'll have 
 more than one solutions. Here are some examples:
 
 1. If the files uploaded to your server are all expected to be small csv 
 files (less than a few MB and not many users do it concurrently), you can 
 do the following:
 
 #in the view to handle the uploaded file: (assume file input name is just 
 "file")
 file_content = request.FILES['file'].read()
 chardet.detect(file_content)
 
 2. Also, chardet seems to support incremental (line-by-line) detection 
 https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
  
 
 
 Given this, we can also read from requests.FILES line by line and pass 
 each line to chardet
 
 from chardet.universaldetector import UniversalDetector
 
 #somewhere in a view function
 detector = UniversalDetector()
 file_handle = request.FILES['file']
 for line in file_handle:
 detector.feed(line)
 if detector.done: break
 detector.close()
 # result available as a dict at detector.result
 
 
 
 
 
 On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
 How to deal with 

Re: Import csv file on django view

2020-07-22 Thread Kovy Jacob
That’s probably not the proper answer, but that’s the best I can do. Sorry :-(

> On Jul 22, 2020, at 10:46 AM, Ronaldo Mata  wrote:
> 
> Yes, the problem here is that the files will be loaded by the user, so I 
> don't know what delimiter I will receive. This is not a base command that I 
> am using, it is the logic that I want to incorporate in a view
> 
> El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ( >) escribió:
> Ah, so is the problem that you don’t always know what the delimiter is when 
> you read it? If yes, what is the use case for this? You might not need a 
> universal solution, maybe just put all the info into a csv yourself, manually.
> 
>> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata > > wrote:
>> 
>> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the 
>> files, sometimes you come separated by "," others by ";" and rarely by "|" 
>> 
>> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob (> >) escribió:
>> Could you just use the standard python csv module?
>> 
>>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata >> > wrote:
>>> 
>>> Hi Liu thank for your answer.
>>> 
>>> This has been a headache, I am trying to read the file using csv.DictReader 
>>> initially i had an error trying to get the dict keys when iterating by 
>>> rows, and i thought it could be encoding (for this reason i wanted to 
>>> prepare the view to use the correct encoding). for that reason I asked my 
>>> question.
>>> 
>>> 1) your first approach doesn't work, if i send utf-8 file, chardet returns 
>>> ascii as encoding. it seems request.FILES ['file']. read () returns a 
>>> binary with that encoding.
>>> 
>>> 2) In the end I realized that the problem was the delimiter of the csv but 
>>> predicting it is another problem.
>>> 
>>> Anyway, it was a task that I had to do and that was my limitation. I think 
>>> there must be a library that does all this, uploading a csv file is common 
>>> practice in many web apps.
>>> 
>>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng (>> >) escribió:
>>> Hi. First of all, I think it's impossible to perfectly detect encoding 
>>> without further information. See the answer in this SO post: 
>>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>>  
>>> 
>>>  There are many packages and tools to help detect encoding format, but keep 
>>> in mind that they are only giving educated guesses. (Most of the time, the 
>>> guess is correct, but do check the dev page to see whether there are known 
>>> issues related to your problem.)
>>> 
>>> Now let's say you have decided to use chardet. Check its doc page for the 
>>> usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
>>>  You'll have 
>>> more than one solutions. Here are some examples:
>>> 
>>> 1. If the files uploaded to your server are all expected to be small csv 
>>> files (less than a few MB and not many users do it concurrently), you can 
>>> do the following:
>>> 
>>> #in the view to handle the uploaded file: (assume file input name is just 
>>> "file")
>>> file_content = request.FILES['file'].read()
>>> chardet.detect(file_content)
>>> 
>>> 2. Also, chardet seems to support incremental (line-by-line) detection 
>>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>>>  
>>> 
>>> 
>>> Given this, we can also read from requests.FILES line by line and pass each 
>>> line to chardet
>>> 
>>> from chardet.universaldetector import UniversalDetector
>>> 
>>> #somewhere in a view function
>>> detector = UniversalDetector()
>>> file_handle = request.FILES['file']
>>> for line in file_handle:
>>> detector.feed(line)
>>> if detector.done: break
>>> detector.close()
>>> # result available as a dict at detector.result
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>>> How to deal with encoding when you try to read a csv file on view.
>>> 
>>> I have a view to upload csv file, in this view I read file and save each 
>>> row as new record.
>>> 
>>> My bug is when I try to upload a csv file with a differente encoding (not 
>>> UTF-8)
>>> 
>>> how to handle this on django (using request.FILES) I was researching and I 
>>> found chardet but I don't know how to pass it a request.FILES. I need help 
>>> please.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Django users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to django-users+unsubscr...@googlegroups.com 
>>> .
>>> 

Re: Import csv file on django view

2020-07-22 Thread Kovy Jacob
Maybe first use the standard file.open to save the file to a variable, search 
that variable for the different delimiters using standard string manipulation 
vichulu, and then open it using the corresponding delimiter.

> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata  wrote:
> 
> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the 
> files, sometimes you come separated by "," others by ";" and rarely by "|" 
> 
> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ( >) escribió:
> Could you just use the standard python csv module?
> 
>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata > > wrote:
>> 
>> Hi Liu thank for your answer.
>> 
>> This has been a headache, I am trying to read the file using csv.DictReader 
>> initially i had an error trying to get the dict keys when iterating by rows, 
>> and i thought it could be encoding (for this reason i wanted to prepare the 
>> view to use the correct encoding). for that reason I asked my question.
>> 
>> 1) your first approach doesn't work, if i send utf-8 file, chardet returns 
>> ascii as encoding. it seems request.FILES ['file']. read () returns a binary 
>> with that encoding.
>> 
>> 2) In the end I realized that the problem was the delimiter of the csv but 
>> predicting it is another problem.
>> 
>> Anyway, it was a task that I had to do and that was my limitation. I think 
>> there must be a library that does all this, uploading a csv file is common 
>> practice in many web apps.
>> 
>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng (> >) escribió:
>> Hi. First of all, I think it's impossible to perfectly detect encoding 
>> without further information. See the answer in this SO post: 
>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>  
>> 
>>  There are many packages and tools to help detect encoding format, but keep 
>> in mind that they are only giving educated guesses. (Most of the time, the 
>> guess is correct, but do check the dev page to see whether there are known 
>> issues related to your problem.)
>> 
>> Now let's say you have decided to use chardet. Check its doc page for the 
>> usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
>>  You'll have more 
>> than one solutions. Here are some examples:
>> 
>> 1. If the files uploaded to your server are all expected to be small csv 
>> files (less than a few MB and not many users do it concurrently), you can do 
>> the following:
>> 
>> #in the view to handle the uploaded file: (assume file input name is just 
>> "file")
>> file_content = request.FILES['file'].read()
>> chardet.detect(file_content)
>> 
>> 2. Also, chardet seems to support incremental (line-by-line) detection 
>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>>  
>> 
>> 
>> Given this, we can also read from requests.FILES line by line and pass each 
>> line to chardet
>> 
>> from chardet.universaldetector import UniversalDetector
>> 
>> #somewhere in a view function
>> detector = UniversalDetector()
>> file_handle = request.FILES['file']
>> for line in file_handle:
>> detector.feed(line)
>> if detector.done: break
>> detector.close()
>> # result available as a dict at detector.result
>> 
>> 
>> 
>> 
>> 
>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>> How to deal with encoding when you try to read a csv file on view.
>> 
>> I have a view to upload csv file, in this view I read file and save each row 
>> as new record.
>> 
>> My bug is when I try to upload a csv file with a differente encoding (not 
>> UTF-8)
>> 
>> how to handle this on django (using request.FILES) I was researching and I 
>> found chardet but I don't know how to pass it a request.FILES. I need help 
>> please.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users+unsubscr...@googlegroups.com 
>> .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
>>  
>> .
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users+unsubscr...@googlegroups.com 
>> .
>> To view 

Re: Import csv file on django view

2020-07-22 Thread Ronaldo Mata
Yes, the problem here is that the files will be loaded by the user, so I
don't know what delimiter I will receive. This is not a base command that I
am using, it is the logic that I want to incorporate in a view

El mié., 22 jul. 2020 a las 10:43, Kovy Jacob ()
escribió:

> Ah, so is the problem that you don’t always know what the delimiter is
> when you read it? If yes, what is the use case for this? You might not need
> a universal solution, maybe just put all the info into a csv yourself,
> manually.
>
> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata 
> wrote:
>
> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the
> files, sometimes you come separated by "," others by ";" and rarely by "|"
>
> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
> escribió:
>
>> Could you just use the standard python csv module?
>>
>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
>> wrote:
>>
>> Hi Liu thank for your answer.
>>
>> This has been a headache, I am trying to read the file using
>> csv.DictReader initially i had an error trying to get the dict keys when
>> iterating by rows, and i thought it could be encoding (for this reason i
>> wanted to prepare the view to use the correct encoding). for that reason I
>> asked my question.
>>
>> 1) your first approach doesn't work, if i send utf-8 file, chardet
>> returns ascii as encoding. it seems request.FILES ['file']. read () returns
>> a binary with that encoding.
>>
>> 2) In the end I realized that the problem was the delimiter of the csv
>> but predicting it is another problem.
>>
>> Anyway, it was a task that I had to do and that was my limitation. I
>> think there must be a library that does all this, uploading a csv file is
>> common practice in many web apps.
>>
>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
>> escribió:
>>
>>> Hi. First of all, I think it's impossible to perfectly detect encoding
>>> without further information. See the answer in this SO post:
>>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>>  There
>>> are many packages and tools to help detect encoding format, but keep in
>>> mind that they are only giving educated guesses. (Most of the time, the
>>> guess is correct, but do check the dev page to see whether there are known
>>> issues related to your problem.)
>>>
>>> Now let's say you have decided to use chardet. Check its doc page for
>>> the usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll
>>> have more than one solutions. Here are some examples:
>>>
>>> 1. If the files uploaded to your server are all expected to be small csv
>>> files (less than a few MB and not many users do it concurrently), you can
>>> do the following:
>>>
>>> #in the view to handle the uploaded file: (assume file input name is
>>> just "file")
>>> file_content = request.FILES['file'].read()
>>> chardet.detect(file_content)
>>>
>>> 2. Also, chardet seems to support incremental (line-by-line) detection
>>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>>>
>>> Given this, we can also read from requests.FILES line by line and pass
>>> each line to chardet
>>>
>>> from chardet.universaldetector import UniversalDetector
>>>
>>> #somewhere in a view function
>>> detector = UniversalDetector()
>>> file_handle = request.FILES['file']
>>> for line in file_handle:
>>> detector.feed(line)
>>> if detector.done: break
>>> detector.close()
>>> # result available as a dict at detector.result
>>>
>>>
>>>
>>>
>>>
>>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:

 How to deal with encoding when you try to read a csv file on view.

 I have a view to upload csv file, in this view I read file and save
 each row as new record.

 My bug is when I try to upload a csv file with a differente encoding
 (not UTF-8)

 how to handle this on django (using request.FILES) I was researching
 and I found chardet but I don't know how to pass it a request.FILES. I need
 help please.

>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-users+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
>>> 
>>> .
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> 

Re: Import csv file on django view

2020-07-22 Thread Kovy Jacob
Ah, so is the problem that you don’t always know what the delimiter is when you 
read it? If yes, what is the use case for this? You might not need a universal 
solution, maybe just put all the info into a csv yourself, manually.

> On Jul 22, 2020, at 10:39 AM, Ronaldo Mata  wrote:
> 
> Hi Kovy, I'm using csv module, but I need to handle the delimiters of the 
> files, sometimes you come separated by "," others by ";" and rarely by "|" 
> 
> El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ( >) escribió:
> Could you just use the standard python csv module?
> 
>> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata > > wrote:
>> 
>> Hi Liu thank for your answer.
>> 
>> This has been a headache, I am trying to read the file using csv.DictReader 
>> initially i had an error trying to get the dict keys when iterating by rows, 
>> and i thought it could be encoding (for this reason i wanted to prepare the 
>> view to use the correct encoding). for that reason I asked my question.
>> 
>> 1) your first approach doesn't work, if i send utf-8 file, chardet returns 
>> ascii as encoding. it seems request.FILES ['file']. read () returns a binary 
>> with that encoding.
>> 
>> 2) In the end I realized that the problem was the delimiter of the csv but 
>> predicting it is another problem.
>> 
>> Anyway, it was a task that I had to do and that was my limitation. I think 
>> there must be a library that does all this, uploading a csv file is common 
>> practice in many web apps.
>> 
>> El mar., 21 jul. 2020 a las 13:47, Liu Zheng (> >) escribió:
>> Hi. First of all, I think it's impossible to perfectly detect encoding 
>> without further information. See the answer in this SO post: 
>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>  
>> 
>>  There are many packages and tools to help detect encoding format, but keep 
>> in mind that they are only giving educated guesses. (Most of the time, the 
>> guess is correct, but do check the dev page to see whether there are known 
>> issues related to your problem.)
>> 
>> Now let's say you have decided to use chardet. Check its doc page for the 
>> usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
>>  You'll have more 
>> than one solutions. Here are some examples:
>> 
>> 1. If the files uploaded to your server are all expected to be small csv 
>> files (less than a few MB and not many users do it concurrently), you can do 
>> the following:
>> 
>> #in the view to handle the uploaded file: (assume file input name is just 
>> "file")
>> file_content = request.FILES['file'].read()
>> chardet.detect(file_content)
>> 
>> 2. Also, chardet seems to support incremental (line-by-line) detection 
>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>>  
>> 
>> 
>> Given this, we can also read from requests.FILES line by line and pass each 
>> line to chardet
>> 
>> from chardet.universaldetector import UniversalDetector
>> 
>> #somewhere in a view function
>> detector = UniversalDetector()
>> file_handle = request.FILES['file']
>> for line in file_handle:
>> detector.feed(line)
>> if detector.done: break
>> detector.close()
>> # result available as a dict at detector.result
>> 
>> 
>> 
>> 
>> 
>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>> How to deal with encoding when you try to read a csv file on view.
>> 
>> I have a view to upload csv file, in this view I read file and save each row 
>> as new record.
>> 
>> My bug is when I try to upload a csv file with a differente encoding (not 
>> UTF-8)
>> 
>> how to handle this on django (using request.FILES) I was researching and I 
>> found chardet but I don't know how to pass it a request.FILES. I need help 
>> please.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users+unsubscr...@googlegroups.com 
>> .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
>>  
>> .
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users+unsubscr...@googlegroups.com 
>> .

Re: Import csv file on django view

2020-07-22 Thread Ronaldo Mata
Hi Kovy, I'm using csv module, but I need to handle the delimiters of the
files, sometimes you come separated by "," others by ";" and rarely by "|"

El mié., 22 jul. 2020 a las 10:28, Kovy Jacob ()
escribió:

> Could you just use the standard python csv module?
>
> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata 
> wrote:
>
> Hi Liu thank for your answer.
>
> This has been a headache, I am trying to read the file using
> csv.DictReader initially i had an error trying to get the dict keys when
> iterating by rows, and i thought it could be encoding (for this reason i
> wanted to prepare the view to use the correct encoding). for that reason I
> asked my question.
>
> 1) your first approach doesn't work, if i send utf-8 file, chardet returns
> ascii as encoding. it seems request.FILES ['file']. read () returns a
> binary with that encoding.
>
> 2) In the end I realized that the problem was the delimiter of the csv but
> predicting it is another problem.
>
> Anyway, it was a task that I had to do and that was my limitation. I think
> there must be a library that does all this, uploading a csv file is common
> practice in many web apps.
>
> El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
> escribió:
>
>> Hi. First of all, I think it's impossible to perfectly detect encoding
>> without further information. See the answer in this SO post:
>> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>>  There
>> are many packages and tools to help detect encoding format, but keep in
>> mind that they are only giving educated guesses. (Most of the time, the
>> guess is correct, but do check the dev page to see whether there are known
>> issues related to your problem.)
>>
>> Now let's say you have decided to use chardet. Check its doc page for the
>> usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll
>> have more than one solutions. Here are some examples:
>>
>> 1. If the files uploaded to your server are all expected to be small csv
>> files (less than a few MB and not many users do it concurrently), you can
>> do the following:
>>
>> #in the view to handle the uploaded file: (assume file input name is just
>> "file")
>> file_content = request.FILES['file'].read()
>> chardet.detect(file_content)
>>
>> 2. Also, chardet seems to support incremental (line-by-line) detection
>> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>>
>> Given this, we can also read from requests.FILES line by line and pass
>> each line to chardet
>>
>> from chardet.universaldetector import UniversalDetector
>>
>> #somewhere in a view function
>> detector = UniversalDetector()
>> file_handle = request.FILES['file']
>> for line in file_handle:
>> detector.feed(line)
>> if detector.done: break
>> detector.close()
>> # result available as a dict at detector.result
>>
>>
>>
>>
>>
>> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>>>
>>> How to deal with encoding when you try to read a csv file on view.
>>>
>>> I have a view to upload csv file, in this view I read file and save each
>>> row as new record.
>>>
>>> My bug is when I try to upload a csv file with a differente encoding
>>> (not UTF-8)
>>>
>>> how to handle this on django (using request.FILES) I was researching and
>>> I found chardet but I don't know how to pass it a request.FILES. I need
>>> help please.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to django-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
>> 
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com
> 
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/91E9FE01-4701-478C-B575-2BD5BA5DCE86%40gmail.com
> 

Re: Import csv file on django view

2020-07-22 Thread Kovy Jacob
Could you just use the standard python csv module?

> On Jul 22, 2020, at 10:25 AM, Ronaldo Mata  wrote:
> 
> Hi Liu thank for your answer.
> 
> This has been a headache, I am trying to read the file using csv.DictReader 
> initially i had an error trying to get the dict keys when iterating by rows, 
> and i thought it could be encoding (for this reason i wanted to prepare the 
> view to use the correct encoding). for that reason I asked my question.
> 
> 1) your first approach doesn't work, if i send utf-8 file, chardet returns 
> ascii as encoding. it seems request.FILES ['file']. read () returns a binary 
> with that encoding.
> 
> 2) In the end I realized that the problem was the delimiter of the csv but 
> predicting it is another problem.
> 
> Anyway, it was a task that I had to do and that was my limitation. I think 
> there must be a library that does all this, uploading a csv file is common 
> practice in many web apps.
> 
> El mar., 21 jul. 2020 a las 13:47, Liu Zheng ( >) escribió:
> Hi. First of all, I think it's impossible to perfectly detect encoding 
> without further information. See the answer in this SO post: 
> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>  
> 
>  There are many packages and tools to help detect encoding format, but keep 
> in mind that they are only giving educated guesses. (Most of the time, the 
> guess is correct, but do check the dev page to see whether there are known 
> issues related to your problem.)
> 
> Now let's say you have decided to use chardet. Check its doc page for the 
> usage: https://chardet.readthedocs.io/en/latest/usage.html#usage 
>  You'll have more 
> than one solutions. Here are some examples:
> 
> 1. If the files uploaded to your server are all expected to be small csv 
> files (less than a few MB and not many users do it concurrently), you can do 
> the following:
> 
> #in the view to handle the uploaded file: (assume file input name is just 
> "file")
> file_content = request.FILES['file'].read()
> chardet.detect(file_content)
> 
> 2. Also, chardet seems to support incremental (line-by-line) detection 
> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>  
> 
> 
> Given this, we can also read from requests.FILES line by line and pass each 
> line to chardet
> 
> from chardet.universaldetector import UniversalDetector
> 
> #somewhere in a view function
> detector = UniversalDetector()
> file_handle = request.FILES['file']
> for line in file_handle:
> detector.feed(line)
> if detector.done: break
> detector.close()
> # result available as a dict at detector.result
> 
> 
> 
> 
> 
> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
> How to deal with encoding when you try to read a csv file on view.
> 
> I have a view to upload csv file, in this view I read file and save each row 
> as new record.
> 
> My bug is when I try to upload a csv file with a differente encoding (not 
> UTF-8)
> 
> how to handle this on django (using request.FILES) I was researching and I 
> found chardet but I don't know how to pass it a request.FILES. I need help 
> please.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-users+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
>  
> .
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-users+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Re: Import csv file on django view

2020-07-22 Thread Ronaldo Mata
Hi Liu thank for your answer.

This has been a headache, I am trying to read the file using
csv.DictReader initially i had an error trying to get the dict keys when
iterating by rows, and i thought it could be encoding (for this reason i
wanted to prepare the view to use the correct encoding). for that reason I
asked my question.

1) your first approach doesn't work, if i send utf-8 file, chardet returns
ascii as encoding. it seems request.FILES ['file']. read () returns a
binary with that encoding.

2) In the end I realized that the problem was the delimiter of the csv but
predicting it is another problem.

Anyway, it was a task that I had to do and that was my limitation. I think
there must be a library that does all this, uploading a csv file is common
practice in many web apps.

El mar., 21 jul. 2020 a las 13:47, Liu Zheng ()
escribió:

> Hi. First of all, I think it's impossible to perfectly detect encoding
> without further information. See the answer in this SO post:
> https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
>  There
> are many packages and tools to help detect encoding format, but keep in
> mind that they are only giving educated guesses. (Most of the time, the
> guess is correct, but do check the dev page to see whether there are known
> issues related to your problem.)
>
> Now let's say you have decided to use chardet. Check its doc page for the
> usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll
> have more than one solutions. Here are some examples:
>
> 1. If the files uploaded to your server are all expected to be small csv
> files (less than a few MB and not many users do it concurrently), you can
> do the following:
>
> #in the view to handle the uploaded file: (assume file input name is just
> "file")
> file_content = request.FILES['file'].read()
> chardet.detect(file_content)
>
> 2. Also, chardet seems to support incremental (line-by-line) detection
> https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally
>
> Given this, we can also read from requests.FILES line by line and pass
> each line to chardet
>
> from chardet.universaldetector import UniversalDetector
>
> #somewhere in a view function
> detector = UniversalDetector()
> file_handle = request.FILES['file']
> for line in file_handle:
> detector.feed(line)
> if detector.done: break
> detector.close()
> # result available as a dict at detector.result
>
>
>
>
>
> On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>>
>> How to deal with encoding when you try to read a csv file on view.
>>
>> I have a view to upload csv file, in this view I read file and save each
>> row as new record.
>>
>> My bug is when I try to upload a csv file with a differente encoding (not
>> UTF-8)
>>
>> how to handle this on django (using request.FILES) I was researching and
>> I found chardet but I don't know how to pass it a request.FILES. I need
>> help please.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAP%3DoziQuZyb74Wsk%2BnjngUpSccOKCYRM_C%3D7KgGX%2BgV5wRzHwQ%40mail.gmail.com.


Re: Import csv file on django view

2020-07-21 Thread Liu Zheng
Hi. First of all, I think it's impossible to perfectly detect encoding 
without further information. See the answer in this SO post: 
https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text
 There 
are many packages and tools to help detect encoding format, but keep in 
mind that they are only giving educated guesses. (Most of the time, the 
guess is correct, but do check the dev page to see whether there are known 
issues related to your problem.)

Now let's say you have decided to use chardet. Check its doc page for the 
usage: https://chardet.readthedocs.io/en/latest/usage.html#usage You'll 
have more than one solutions. Here are some examples:

1. If the files uploaded to your server are all expected to be small csv 
files (less than a few MB and not many users do it concurrently), you can 
do the following:

#in the view to handle the uploaded file: (assume file input name is just 
"file")
file_content = request.FILES['file'].read()
chardet.detect(file_content)

2. Also, chardet seems to support incremental (line-by-line) detection 
https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally

Given this, we can also read from requests.FILES line by line and pass each 
line to chardet

from chardet.universaldetector import UniversalDetector

#somewhere in a view function
detector = UniversalDetector()
file_handle = request.FILES['file']
for line in file_handle:
detector.feed(line)
if detector.done: break
detector.close()
# result available as a dict at detector.result





On Tuesday, July 21, 2020 at 7:09:35 AM UTC+8, Ronaldo Mata wrote:
>
> How to deal with encoding when you try to read a csv file on view.
>
> I have a view to upload csv file, in this view I read file and save each 
> row as new record.
>
> My bug is when I try to upload a csv file with a differente encoding (not 
> UTF-8)
>
> how to handle this on django (using request.FILES) I was researching and I 
> found chardet but I don't know how to pass it a request.FILES. I need help 
> please.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/64307441-0e65-45a2-b917-ece15a4ea729o%40googlegroups.com.