Hi,
I've trying Drill because it looks very promising but I've encountered
some issues which I couldn't solve. I'm wondering if I'm not configuring
something properly or if there's some bug.
The first issue is that I when try to read a Sequence file, the content
that I get it's different from the one on the original file.
$ hadoop fs -text /user/ctarsa/esborram2.seq
16/11/24 16:27:37 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
key0 value0
key1 value1
key10 value10
key {"review":"{"author":"àéïöç"}"}
key
{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"
When I try to read it back from DRILL
0: jdbc:drill:zk=local> select (convert_from(binary_key,'UTF8')),
(convert_from(binary_value,'UTF8')) from
dfs.`hdfs:/user/ctarsa/esborram2.seq`;
+--------+--------+
| EXPR$0 | EXPR$1 |
+--------+--------+
| key0 | value0 |
| key1 | value1 |
| key10 | value10 |
| key | ${"review":"{"author":"àéïöç"}"} |
| key |
��{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"
|
+--------+--------+
5 rows selected (0.308 seconds)
Notice that there are some extra characters, marked in red. Also notice
that on the first rows the | don't seam to be aligned.
I've tried it in a Mac machine with the latest Drill (1.8.0) with hadoop
2.6.0-cdh5.4.4 and also in a Linux box. I've also tried with different
compressions (No compression, LZO, LZO Block, LZO Record) on the
sequence file with no success.
Can you please help ?
Thanks,
Carles