OK,
thank you for your reply! In the meantime I figured out why this was
working without errors in my first code!
There I had some REGEX checks before saving each row into the
database. That means the first row always got skipped, because the
unicode indentifiers didn't fit to the REGEX.
Now I know where my fault is, but I don't really know how to solve it.
If the source csv is in utf-8 I can of course strip the first three
chars. But if it is in another encoding, that means I strip of chars
that I need. How can I check which encoding the file has? I tried this
here, but that gives me always CP850 as encoding:
file = File.open("my.csv")
puts file.external_encoding.name
Or is there a way to transform a file before uploading? I use
file.temp for uploading.
Cheers,
Sebastian
On 4 Jul., 18:31, Walter Lee Davis <[email protected]> wrote:
> Unicode uses them to indicate to the application reading the text file
> which order the following bytes are in. Since UTF-8 uses compound
> characters to indicate the scary-high end of the unicode character
> table (two bytes needed to encode some characters) the order that the
> bits arrived in is of critical importance. Text files may be little-
> endian or big-endian, and unless you know what order to expect, you
> can't really know.
>
> Walter
>
> On Jul 4, 2011, at 3:02 AM, Sebastian wrote:
>
>
>
>
>
>
>
> > Thank you for your reply!
>
> > Stripping the first chars is possible of course, but I don't
> > understand why these chars are there.
>
> > It was working before! I could just upload the utf-8 csv and everthing
> > was working great before. I don't really know what I changed that now
> > these chars are appearing.
>
> > Sebastian
>
> > On 1 Jul., 15:12, Frederick Cheung <[email protected]> wrote:
> >> On Jul 1, 11:48 am, Sebastian <[email protected]> wrote:
>
> >>> OK,
>
> >>> it was working perfectly when I just made sure that my csv file is
> >>> in
> >>> utf-8 encoding format.
>
> >>> I deleted some of my programm, so I had to write a lot of stuff
> >>> again.
>
> >>> If I now upload a csv file which is in utf-8 format and then I have
> >>> every time in the first row that the first three character are: \xEF
> >>> \xBBxBF
>
> >> That's a utf BOM: a magic unicode character that tells whoever is
> >> reading the stream what endianness is and also allows to tell UTF8
> >> apart from utf16
> >> You can safely strip them from the file.
>
> >>> I read that these is something about unicode and ordering, but i
> >>> don't
> >>> know where these hex chars come from.
>
> >>> Also every german special character is also shown in this hex code,
> >>> e.g. "k\xC3\xBChler" should be "kühler"
>
> >> That is probably just an output thing if you are seeing this in a
> >> terminal window- \xC3\xBC is the utf8 sequence for ü
>
> >> Fred
>
> >>> If I use files in other encodings there are not these three chars in
> >>> the beginning, but every special char is "?"
>
> >>> Has anyone an idea where this comes from?
>
> >>> Cheers,
> >>> Sebastian
>
> >>> On 22 Jun., 13:26, Sebastian <[email protected]> wrote:
>
> >>>> file.temp is an object. I have a form where a csv can be
> >>>> uploaded, but
> >>>> it is never stored. That's why I use tempfile. That means that I
> >>>> probably have no path to use in that method.
>
> >>>> BUT, the open and foreach method for the CSV class is working
> >>>> with an
> >>>> object whenever I don't have a german special character in my csv
> >>>> file
> >>>> or when my csv file is already in utf-8 encoding format.
>
> >>>> On 22 Jun., 12:05, Chirag Singhal <[email protected]> wrote:
>
> >>>>> What does file.tempfile return?
> >>>>> If it is a file object, then we have a problem, we need to pass
> >>>>> in file path
> >>>>> here.
> >>>>> So call path on the file object and pass that as the first
> >>>>> argument.
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to rubyonrails-
> > [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]
> > .
> > For more options, visit this group
> > athttp://groups.google.com/group/rubyonrails-talk?hl=en
> > .
--
You received this message because you are subscribed to the Google Groups "Ruby
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.