This Error message does not appear as I upgraded to 1.6.0 . -- Cheers, Todd Leo
On Tue, Feb 9, 2016 at 9:07 AM SLiZn Liu <sliznmail...@gmail.com> wrote: > At least works for me though, temporarily disabled Kyro serilizer until > upgrade to 1.6.0. Appreciate for your update. :) > Luciano Resende <luckbr1...@gmail.com>于2016年2月9日 周二02:37写道: > >> Sorry, same expected results with trunk and Kryo serializer >> >> On Mon, Feb 8, 2016 at 4:15 AM, SLiZn Liu <sliznmail...@gmail.com> wrote: >> >>> I’ve found the trigger of my issue: if I start my spark-shell or submit >>> by spark-submit with --conf >>> spark.serializer=org.apache.spark.serializer.KryoSerializer, the >>> DataFrame content goes wrong, as I described earlier. >>> >>> >>> On Mon, Feb 8, 2016 at 5:42 PM SLiZn Liu <sliznmail...@gmail.com> wrote: >>> >>>> Thanks Luciano, now it looks like I’m the only guy who have this issue. >>>> My options is narrowed down to upgrade my spark to 1.6.0, to see if this >>>> issue is gone. >>>> >>>> — >>>> Cheers, >>>> Todd Leo >>>> >>>> >>>> >>>> On Mon, Feb 8, 2016 at 2:12 PM Luciano Resende <luckbr1...@gmail.com> >>>> wrote: >>>> >>>>> I tried in both 1.5.0, 1.6.0 and 2.0.0 trunk and >>>>> com.databricks:spark-csv_2.10:1.3.0 with expected results, where the >>>>> columns seem to be read properly. >>>>> >>>>> +----------+----------------------+ >>>>> |C0 |C1 | >>>>> +----------+----------------------+ >>>>> >>>>> |1446566430 | 2015-11-04<SP>00:00:30| >>>>> |1446566430 | 2015-11-04<SP>00:00:30| >>>>> |1446566430 | 2015-11-04<SP>00:00:30| >>>>> |1446566430 | 2015-11-04<SP>00:00:30| >>>>> |1446566430 | 2015-11-04<SP>00:00:30| >>>>> |1446566431 | 2015-11-04<SP>00:00:31| >>>>> |1446566431 | 2015-11-04<SP>00:00:31| >>>>> |1446566431 | 2015-11-04<SP>00:00:31| >>>>> |1446566431 | 2015-11-04<SP>00:00:31| >>>>> |1446566431 | 2015-11-04<SP>00:00:31| >>>>> +----------+----------------------+ >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Feb 6, 2016 at 11:44 PM, SLiZn Liu <sliznmail...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Spark Users Group, >>>>>> >>>>>> I have a csv file to analysis with Spark, but I’m troubling with >>>>>> importing as DataFrame. >>>>>> >>>>>> Here’s the minimal reproducible example. Suppose I’m having a >>>>>> *10(rows)x2(cols)* *space-delimited csv* file, shown as below: >>>>>> >>>>>> 1446566430 2015-11-04<SP>00:00:30 >>>>>> 1446566430 2015-11-04<SP>00:00:30 >>>>>> 1446566430 2015-11-04<SP>00:00:30 >>>>>> 1446566430 2015-11-04<SP>00:00:30 >>>>>> 1446566430 2015-11-04<SP>00:00:30 >>>>>> 1446566431 2015-11-04<SP>00:00:31 >>>>>> 1446566431 2015-11-04<SP>00:00:31 >>>>>> 1446566431 2015-11-04<SP>00:00:31 >>>>>> 1446566431 2015-11-04<SP>00:00:31 >>>>>> 1446566431 2015-11-04<SP>00:00:31 >>>>>> >>>>>> the <SP> in column 2 represents sub-delimiter within that column, >>>>>> and this file is stored on HDFS, let’s say the path is >>>>>> hdfs:///tmp/1.csv >>>>>> >>>>>> I’m using *spark-csv* to import this file as Spark *DataFrame*: >>>>>> >>>>>> sqlContext.read.format("com.databricks.spark.csv") >>>>>> .option("header", "false") // Use first line of all files as >>>>>> header >>>>>> .option("inferSchema", "false") // Automatically infer data types >>>>>> .option("delimiter", " ") >>>>>> .load("hdfs:///tmp/1.csv") >>>>>> .show >>>>>> >>>>>> Oddly, the output shows only a part of each column: >>>>>> >>>>>> [image: Screenshot from 2016-02-07 15-27-51.png] >>>>>> >>>>>> and even the boundary of the table wasn’t shown correctly. I also >>>>>> used the other way to read csv file, by sc.textFile(...).map(_.split(" >>>>>> ")) and sqlContext.createDataFrame, and the result is the same. Can >>>>>> someone point me out where I did it wrong? >>>>>> >>>>>> — >>>>>> BR, >>>>>> Todd Leo >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Luciano Resende >>>>> http://people.apache.org/~lresende >>>>> http://twitter.com/lresende1975 >>>>> http://lresende.blogspot.com/ >>>>> >>>> >> >> >> -- >> Luciano Resende >> http://people.apache.org/~lresende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> >