Hi,

I've trying Drill because it looks very promising but I've encountered some issues which I couldn't solve. I'm wondering if I'm not configuring something properly or if there's some bug.

The first issue is that I when try to read a Sequence file, the content that I get it's different from the one on the original file.

$ hadoop fs -text /user/ctarsa/esborram2.seq
16/11/24 16:27:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
key0       value0
key1       value1
key10      value10
key        {"review":"{"author":"àéïöç"}"}
key {"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"

When I try to read it back from DRILL

0: jdbc:drill:zk=local> select (convert_from(binary_key,'UTF8')), (convert_from(binary_value,'UTF8')) from dfs.`hdfs:/user/ctarsa/esborram2.seq`;
+--------+--------+
| EXPR$0 | EXPR$1 |
+--------+--------+
| key0 | value0 |
| key1 | value1 |
| key10 | value10 |
| key | ${"review":"{"author":"àéïöç"}"} |
| key | ��{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445" |
+--------+--------+
5 rows selected (0.308 seconds)

Notice that there are some extra characters, marked in red. Also notice that on the first rows the | don't seam to be aligned.

I've tried it in a Mac machine with the latest Drill (1.8.0) with hadoop 2.6.0-cdh5.4.4 and also in a Linux box. I've also tried with different compressions (No compression, LZO, LZO Block, LZO Record) on the sequence file with no success.

Can you please help ?

Thanks,

Carles

Reply via email to