[4/5] kudu git commit: Document Impala and Spark integration known issues & limitations

danburkert Fri, 07 Oct 2016 11:18:25 -0700

Document Impala and Spark integration known issues & limitations

Change-Id: I993a09a00f5ab0049fec95e967abc1740b44dc8d
Reviewed-on: http://gerrit.cloudera.org:8080/4443
Tested-by: Dan Burkert <d...@cloudera.com>
Reviewed-by: Jean-Daniel Cryans <jdcry...@apache.org>
(cherry picked from commit 92f7c1914ab29061d324a9a38aa5bb05ca598d47)
Reviewed-on: http://gerrit.cloudera.org:8080/4660
Reviewed-by: Dan Burkert <d...@cloudera.com>



Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/6b30d7ea
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/6b30d7ea
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/6b30d7ea

Branch: refs/heads/branch-1.0.x
Commit: 6b30d7ea7da7dc54f0b593754b12c04febde0a0c
Parents: 427cf77
Author: Dan Burkert <d...@cloudera.com>
Authored: Fri Sep 16 14:16:36 2016 -0700
Committer: Dan Burkert <d...@cloudera.com>
Committed: Fri Oct 7 18:17:26 2016 +0000

----------------------------------------------------------------------
 docs/developing.adoc              | 14 ++++++++++++++
 docs/kudu_impala_integration.adoc | 22 ++++++++++++++++++++++
 2 files changed, 36 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/6b30d7ea/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index b4d8604..8833369 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -151,6 +151,20 @@ kuduContext.tableExists("another_table")
 kuduContext.deleteTable("unwanted_table")
 ----
 
+=== Spark Integration Known Issues and Limitations
+
+- The Kudu Spark integration is tested and developed against Spark 1.6 and 
Scala
+  2.10.
+- Kudu tables with a name containing upper case or non-ascii characters must be
+  assigned an alternate name when registered as a temporary table.
+- Kudu tables with a column name containing upper case or non-ascii characters
+  may not be used with SparkSQL. Non-primary key columns may be renamed in Kudu
+  to work around this issue.
+- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to
+  Kudu, and instead will be evaluated by the Spark task.
+- Kudu does not support all types supported by Spark SQL, such as `Date`,
+  `Decimal` and complex types.
+
 == Integration with MapReduce, YARN, and Other Frameworks
 
 Kudu was designed to integrate with MapReduce, YARN, Spark, and other 
frameworks in

http://git-wip-us.apache.org/repos/asf/kudu/blob/6b30d7ea/docs/kudu_impala_integration.adoc
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.adoc 
b/docs/kudu_impala_integration.adoc
index e2fe89c..ec86c18 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -1083,3 +1083,25 @@ The examples above have only explored a fraction of what 
you can do with Impala
 - View the 
link:http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_langref.html[Impala
 SQL reference].
 - Read about Impala internals or learn how to contribute to Impala on the 
link:https://github.com/cloudera/Impala/wiki[Impala Wiki].
 - Read about the native link:installation.html#view_api[Kudu APIs].
+
+=== Known Issues and Limitations
+
+- Kudu tables with a name containing upper case or non-ascii characters must be
+  assigned an alternate name when used as an external table in Impala.
+- Kudu tables with a column name containing upper case or non-ascii characters
+  may not be used as an external table in Impala. Non-primary key columns may 
be
+  renamed in Kudu to work around this issue.
+- When creating a Kudu table, the `CREATE TABLE` statement must include the
+  primary key columns before other columns, in primary key order.
+- Kudu tables containing `UNIXTIME_MICROS`-typed columns may not be used as an
+  external table in Impala.
+- Impala can not create Kudu tables with `TIMESTAMP` or nested-typed columns.
+- Impala can not update values in primary key columns.
+- `NULL`, `NOT NULL`, `!=`, and `IN` predicates are not pushed to Kudu, and
+  instead will be evaluated by the Impala scan node.
+- Impala can not specify column encoding or compression during Kudu table
+  creation, or alter a columns encoding or compression.
+- Impala can not create Kudu tables with bounded range partitions, and can not
+  alter a table to add or remove range partitions.
+- When bulk writing to a Kudu table, performance may be improved by setting the
+  `batch_size` option (see <<kudu_impala_insert_bulk>>).

[4/5] kudu git commit: Document Impala and Spark integration known issues & limitations

Reply via email to