These are actually Scala / Java questions. On Sat, Jun 21, 2014 at 1:08 AM, anny9699 <anny9...@gmail.com> wrote: > 1) One of the separators is '\004', which could be recognized by python or R > or Hive, however Spark seems can't recognize this one and returns a symbol > looking like '?'. Also this symbol is not a question mark and I don't know > how to parse.
(The \004 octal syntax appears deprecated, but it works.) It's not turned into ?, it is just how the shell shows non-printing characters. scala> val c = '\004' warning: there were 1 deprecation warning(s); re-run with -deprecation for details c: Char = ? scala> c.toInt res2: Int = 4 Which is all correct. Is it presenting any problem? > 2) Some of the separator are composed of several Chars, like "} =>". If I > use str.split(Array('}', '=>')), it will separate the string but with many > white spaces included in the middle. Is there a good way that I could > separate by String instead of by Array of Chars? Your example doesn't compile but I assume the argument should be an array of the 3 chars. String.split will return an empty match between tokens. If you don't want them, you can str.split(...).filterNot(_.isEmpty)