Hello together,
with large test data (.csv > 5 GB) I now wanted to do some tests.
Unfortunately it fails again pretty early. I have an EXTERNAL TABLE applied as
follows:
CREATE EXTERNAL TABLE dfkklocks_hist
(
validfrom timestamp,
validthru timestamp,
client text,
loobj1 text,
lotyp text,
proid text,
lockr text,
fdate date,
tdate date,
gpart text,
vkont text,
cond_loobj text,
actkey text,
uname text,
adatum date,
azeit text,
protected text,
laufd date,
laufi text
)
using csv with ('csvfile.delimiter'='~') location ‚file:path/to/csv/file;
Then I create a table with the suffix *_internal and the parquet type as
follows:
CREATE TABLE dfkklocks_hist_internal
(
validfrom timestamp,
validthru timestamp,
client text,
loobj1 text,
lotyp text,
proid text,
lockr text,
fdate date,
tdate date,
gpart text,
vkont text,
cond_loobj text,
actkey text,
uname text,
adatum date,
azeit text,
protected text,
laufd date,
laufi text
) using parquet;
This csv-file contains records such as these:
2014-08-19 21:03:32.78~9999-12-31
23:59:59.999~200~0000000000530010000053~06~01~5~2005-12-31~9999-12-31~0010000053~000000000053~~~FREITAG~2006-06-01~125611~~1800-01-01~
Now I would like to insert content from cdv-file to the table using parquet as
follows::
contract> INSERT INTO dfkklocks_hist_internal SELECT * FROM dfkklocks_hist;
ERROR: Cannot convert Tajo type: TIMESTAMP
java.lang.RuntimeException: Cannot convert Tajo type: TIMESTAMP
at
org.apache.tajo.storage.parquet.TajoSchemaConverter.convertColumn(TajoSchemaConverter.java:191)
at
org.apache.tajo.storage.parquet.TajoSchemaConverter.convert(TajoSchemaConverter.java:150)
at
org.apache.tajo.storage.parquet.TajoWriteSupport.<init>(TajoWriteSupport.java:54)
at
org.apache.tajo.storage.parquet.TajoParquetWriter.<init>(TajoParquetWriter.java:80)
at
org.apache.tajo.storage.parquet.ParquetAppender.init(ParquetAppender.java:75)
at
org.apache.tajo.engine.planner.physical.StoreTableExec.init(StoreTableExec.java:69)
at org.apache.tajo.worker.Task.run(Task.java:423)
at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:425)
at java.lang.Thread.run(Thread.java:745)
In TajoSchemaConverter.java it looks as if it would not be possible to use a
Tajo timestamp in parquet. Am I right with the assumption?
Change the timestamp value (see example data set) also did not bring me to
success. I had, at first, the assumption that the timestamp is not valid. But
timestamp values like eg: 1970-00-00 00: 00: 00.000 or 1971-01-01 01: 01: 01
000 showed no change in behavior.
Are my conclusions thus far correct? Is this an outstanding bug? Am I doing
something wrong maybe? What chance would there still that could lead me to the
goal that I have not yet listed here?
private Type convertColumn(Column column) {
TajoDataTypes.Type type = column.getDataType().getType();
switch (type) {
case BOOLEAN:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.BOOLEAN);
case BIT:
case INT2:
case INT4:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.INT32);
case INT8:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.INT64);
case FLOAT4:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.FLOAT);
case FLOAT8:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.DOUBLE);
case CHAR:
case TEXT:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.BINARY,
OriginalType.UTF8);
case PROTOBUF:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.BINARY);
case BLOB:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.BINARY);
case INET4:
case INET6:
return primitive(column.getSimpleName(),
PrimitiveType.PrimitiveTypeName.BINARY);
default:
throw new RuntimeException("Cannot convert Tajo type: " + type);
}
}
I'm really thankful that there is a community like you guys out there that fix
a support in such errors together.
Have a nice weekend.
Best regards,
Chris