Hi,

I find partly a solution. I just use this code:

    file = params[:file].tempfile
    content = file.read.force_encoding("UTF-8")
    content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')
    @csv = CSV.new(content, {:headers => false, :col_sep => ";"})

I found it here: 
http://stackoverflow.com/questions/5011504/is-there-a-way-to-remove-the-bom-from-a-utf-8-encoded-file

There is still a problem when the source file is not utf-8 encoded!



On 5 Jul., 10:14, Sebastian <[email protected]> wrote:
> OK,
>
> thank you for your reply! In the meantime I figured out why this was
> working without errors in my first code!
>
> There I had some REGEX checks before saving each row into the
> database. That means the first row always got skipped, because the
> unicode indentifiers didn't fit to the REGEX.
>
> Now I know where my fault is, but I don't really know how to solve it.
>
> If the source csv is in utf-8 I can of course strip the first three
> chars. But if it is in another encoding, that means I strip of chars
> that I need. How can I check which encoding the file has? I tried this
> here, but that gives me always CP850 as encoding:
>
> file = File.open("my.csv")
> puts file.external_encoding.name
>
> Or is there a way to transform a file before uploading? I use
> file.temp for uploading.
>
> Cheers,
> Sebastian
>
> On 4 Jul., 18:31, Walter Lee Davis <[email protected]> wrote:
>
>
>
>
>
>
>
> > Unicode uses them to indicate to the application reading the text file  
> > which order the following bytes are in. Since UTF-8 uses compound  
> > characters to indicate the scary-high end of the unicode character  
> > table (two bytes needed to encode some characters) the order that the  
> > bits arrived in is of critical importance. Text files may be little-
> > endian or big-endian, and unless you know what order to expect, you  
> > can't really know.
>
> > Walter
>
> > On Jul 4, 2011, at 3:02 AM, Sebastian wrote:
>
> > > Thank you for your reply!
>
> > > Stripping the first chars is possible of course, but I don't
> > > understand why these chars are there.
>
> > > It was working before! I could just upload the utf-8 csv and everthing
> > > was working great before. I don't really know what I changed that now
> > > these chars are appearing.
>
> > > Sebastian
>
> > > On 1 Jul., 15:12, Frederick Cheung <[email protected]> wrote:
> > >> On Jul 1, 11:48 am, Sebastian <[email protected]> wrote:
>
> > >>> OK,
>
> > >>> it was working perfectly when I just made sure that my csv file is  
> > >>> in
> > >>> utf-8 encoding format.
>
> > >>> I deleted some of my programm, so I had to write a lot of stuff  
> > >>> again.
>
> > >>> If I now upload a csv file which is in utf-8 format and then I have
> > >>> every time in the first row that the first three character are: \xEF
> > >>> \xBBxBF
>
> > >> That's a utf BOM: a magic unicode character that tells whoever is
> > >> reading the stream what endianness is and also allows to tell UTF8
> > >> apart from utf16
> > >> You can safely strip them from the file.
>
> > >>> I read that these is something about unicode and ordering, but i  
> > >>> don't
> > >>> know where these hex chars come from.
>
> > >>> Also every german special character is also shown in this hex code,
> > >>> e.g. "k\xC3\xBChler" should be "kühler"
>
> > >> That is probably just an output thing if you are seeing this in a
> > >> terminal window- \xC3\xBC is the utf8 sequence for ü
>
> > >> Fred
>
> > >>> If I use files in other encodings there are not these three chars in
> > >>> the beginning, but every special char is "?"
>
> > >>> Has anyone an idea where this comes from?
>
> > >>> Cheers,
> > >>> Sebastian
>
> > >>> On 22 Jun., 13:26, Sebastian <[email protected]> wrote:
>
> > >>>> file.temp is an object. I have a form where a csv can be  
> > >>>> uploaded, but
> > >>>> it is never stored. That's why I use tempfile. That means that I
> > >>>> probably have no path to use in that method.
>
> > >>>> BUT, the open and foreach method for the CSV class is working  
> > >>>> with an
> > >>>> object whenever I don't have a german special character in my csv  
> > >>>> file
> > >>>> or when my csv file is already in utf-8 encoding format.
>
> > >>>> On 22 Jun., 12:05, Chirag Singhal <[email protected]> wrote:
>
> > >>>>> What does file.tempfile return?
> > >>>>> If it is a file object, then we have a problem, we need to pass  
> > >>>>> in file path
> > >>>>> here.
> > >>>>> So call path on the file object and pass that as the first  
> > >>>>> argument.
>
> > > --
> > > You received this message because you are subscribed to the Google  
> > > Groups "Ruby on Rails: Talk" group.
> > > To post to this group, send email to rubyonrails-
> > > [email protected].
> > > To unsubscribe from this group, send email to 
> > > [email protected]
> > > .
> > > For more options, visit this group 
> > > athttp://groups.google.com/group/rubyonrails-talk?hl=en
> > > .

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to