Dear All,

I am running Spark 1.5.2 via the SparkR front-end. SpartR retuned error
messages when I tried to process a very simple toy data set with some
Chinese characters in it. The error message looks like this:

"> head(df)
Error in rawToChar(string) :
  embedded nul in string:
'\xd5\003\024\006k\xb0\b?\xe5i\xfeH\0DI\xd8\0W\xc5q10\x8c\t'"

There is probably a simple solution to this but I feel very stuck at the
moment.

I am attaching the R source code as well as the toy data. I hope someone
can give me a hand.

Thank you very much.

Best,
Shige
library(SparkR)

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')


sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)

path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/bda.csv")
df <- read.df(sqlContext, path, source = "com.databricks.spark.csv")

head(df)

show(df)
showDF(df)

sparkR.stop()
sale_ord_det_id,user_log_acct,sale_ord_dt,item_name
4492405119,54321,2015-06-29,【金幻】LED灯泡E27节能灯普通大螺口球泡灯 3W正白
3521699336,54322,2015-03-06,法布尔昆虫记(儿童彩图版 附光盘 套装共10册)
3806707481,54323,2015-04-15,昌林包邮正品中国万能军锹308户外wjq-308多功能折叠锹备车工兵铲
3959751602,54324,2015-05-04,叮当鱼 配件盒 工具盒 大容量可装各种钓鱼附件 钓鱼装备 钓鱼配件 小号配件盒(颜色随机)
3959220084,54322,2015-05-04,迪士尼(Disney)防水夜光米奇电子表 儿童手表女孩运动表学生手表PS021-9
3799693081,54323,2015-04-14,大个子老鼠小个子猫(注音版 套装共9册)
4932295596,54324,2015-08-20,小米 红米Note 联通合约版 白色 联通4G手机 双卡双待 不含合约计划
4014946592,54325,2015-05-11,Jeep/吉普童装 2015新款女童 男童防晒衣 儿童防紫外线外套皮肤 粉红 130cm
4025988780,54326,2015-05-12,民间艺人 带盒荧光棒 钓鱼小配件夜钓专用夜光棒发光棒垂钓小渔具 圆柱20支装 2.9mm
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to