reading different content from sequence files

Carles Tarsà Thu, 24 Nov 2016 08:03:06 -0800

Hi,

I've trying Drill because it looks very promising but I've encounteredsome issues which I couldn't solve. I'm wondering if I'm not configuringsomething properly or if there's some bug.

The first issue is that I when try to read a Sequence file, the contentthat I get it's different from the one on the original file.


$ hadoop fs -text /user/ctarsa/esborram2.seq

16/11/24 16:27:37 WARN util.NativeCodeLoader: Unable to loadnative-hadoop library for your platform... using builtin-java classeswhere applicable

key0       value0
key1       value1
key10      value10
key        {"review":"{"author":"àéïöç"}"}

key{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"


When I try to read it back from DRILL

0: jdbc:drill:zk=local> select (convert_from(binary_key,'UTF8')),(convert_from(binary_value,'UTF8')) fromdfs.`hdfs:/user/ctarsa/esborram2.seq`;

+--------+--------+
| EXPR$0 | EXPR$1 |
+--------+--------+
| key0 | value0 |
| key1 | value1 |
| key10 | value10 |
| key | ${"review":"{"author":"àéïöç"}"} |

| key |��{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"|

+--------+--------+
5 rows selected (0.308 seconds)

Notice that there are some extra characters, marked in red. Also noticethat on the first rows the | don't seam to be aligned.

I've tried it in a Mac machine with the latest Drill (1.8.0) with hadoop2.6.0-cdh5.4.4 and also in a Linux box. I've also tried with differentcompressions (No compression, LZO, LZO Block, LZO Record) on thesequence file with no success.


Can you please help ?

Thanks,

Carles

reading different content from sequence files

Reply via email to