Document Impala and Spark integration known issues & limitations Change-Id: I993a09a00f5ab0049fec95e967abc1740b44dc8d Reviewed-on: http://gerrit.cloudera.org:8080/4443 Tested-by: Dan Burkert <d...@cloudera.com> Reviewed-by: Jean-Daniel Cryans <jdcry...@apache.org> (cherry picked from commit 92f7c1914ab29061d324a9a38aa5bb05ca598d47) Reviewed-on: http://gerrit.cloudera.org:8080/4660 Reviewed-by: Dan Burkert <d...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/6b30d7ea Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/6b30d7ea Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/6b30d7ea Branch: refs/heads/branch-1.0.x Commit: 6b30d7ea7da7dc54f0b593754b12c04febde0a0c Parents: 427cf77 Author: Dan Burkert <d...@cloudera.com> Authored: Fri Sep 16 14:16:36 2016 -0700 Committer: Dan Burkert <d...@cloudera.com> Committed: Fri Oct 7 18:17:26 2016 +0000 ---------------------------------------------------------------------- docs/developing.adoc | 14 ++++++++++++++ docs/kudu_impala_integration.adoc | 22 ++++++++++++++++++++++ 2 files changed, 36 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/6b30d7ea/docs/developing.adoc ---------------------------------------------------------------------- diff --git a/docs/developing.adoc b/docs/developing.adoc index b4d8604..8833369 100644 --- a/docs/developing.adoc +++ b/docs/developing.adoc @@ -151,6 +151,20 @@ kuduContext.tableExists("another_table") kuduContext.deleteTable("unwanted_table") ---- +=== Spark Integration Known Issues and Limitations + +- The Kudu Spark integration is tested and developed against Spark 1.6 and Scala + 2.10. +- Kudu tables with a name containing upper case or non-ascii characters must be + assigned an alternate name when registered as a temporary table. +- Kudu tables with a column name containing upper case or non-ascii characters + may not be used with SparkSQL. Non-primary key columns may be renamed in Kudu + to work around this issue. +- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to + Kudu, and instead will be evaluated by the Spark task. +- Kudu does not support all types supported by Spark SQL, such as `Date`, + `Decimal` and complex types. + == Integration with MapReduce, YARN, and Other Frameworks Kudu was designed to integrate with MapReduce, YARN, Spark, and other frameworks in http://git-wip-us.apache.org/repos/asf/kudu/blob/6b30d7ea/docs/kudu_impala_integration.adoc ---------------------------------------------------------------------- diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc index e2fe89c..ec86c18 100755 --- a/docs/kudu_impala_integration.adoc +++ b/docs/kudu_impala_integration.adoc @@ -1083,3 +1083,25 @@ The examples above have only explored a fraction of what you can do with Impala - View the link:http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_langref.html[Impala SQL reference]. - Read about Impala internals or learn how to contribute to Impala on the link:https://github.com/cloudera/Impala/wiki[Impala Wiki]. - Read about the native link:installation.html#view_api[Kudu APIs]. + +=== Known Issues and Limitations + +- Kudu tables with a name containing upper case or non-ascii characters must be + assigned an alternate name when used as an external table in Impala. +- Kudu tables with a column name containing upper case or non-ascii characters + may not be used as an external table in Impala. Non-primary key columns may be + renamed in Kudu to work around this issue. +- When creating a Kudu table, the `CREATE TABLE` statement must include the + primary key columns before other columns, in primary key order. +- Kudu tables containing `UNIXTIME_MICROS`-typed columns may not be used as an + external table in Impala. +- Impala can not create Kudu tables with `TIMESTAMP` or nested-typed columns. +- Impala can not update values in primary key columns. +- `NULL`, `NOT NULL`, `!=`, and `IN` predicates are not pushed to Kudu, and + instead will be evaluated by the Impala scan node. +- Impala can not specify column encoding or compression during Kudu table + creation, or alter a columns encoding or compression. +- Impala can not create Kudu tables with bounded range partitions, and can not + alter a table to add or remove range partitions. +- When bulk writing to a Kudu table, performance may be improved by setting the + `batch_size` option (see <<kudu_impala_insert_bulk>>).