[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#17). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M python/kudu/client.pyx M python/kudu/tests/common.py M python/kudu/tests/test_client.py M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 56 files changed, 226 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/17 -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit
[kudu-CR] Remove default table partitioning
Dan Burkert has posted comments on this change. Change subject: Remove default table partitioning .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/3131/15/docs/schema_design.adoc File docs/schema_design.adoc: Line 163: IMPORTANT: Kudu does not provide a default partitioning strategy when creating tables. It > If you want to be able to link to this, here is how. You need to change it Done -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Misty Stanley-Jones Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#16). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M python/kudu/client.pyx M python/kudu/tests/common.py M python/kudu/tests/test_client.py M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 56 files changed, 221 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/16 -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit
[kudu-CR] Remove default table partitioning
Misty Stanley-Jones has posted comments on this change. Change subject: Remove default table partitioning .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/3131/13/docs/release_notes.adoc File docs/release_notes.adoc: Line 69: - Default table partitioning has been removed. All tables must now be created > The specifics of how to set partitioning depends on the client, so I'm not Maybe a link to how to do it in Impala? I just think that people might read this and say "Meh, I don't know how to do that so I'm going to ignore it." But maybe not. http://gerrit.cloudera.org:8080/#/c/3131/13/docs/schema_design.adoc File docs/schema_design.adoc: Line 178: distribution keyspace. Range partitioning may be configured to use any subset of > I updated the sentence, let me know if it makes more sense now. Done http://gerrit.cloudera.org:8080/#/c/3131/13/java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java: Line 299: "setRangePartitionColumns or addHashPartitions"); > I personally think documenting setRangePartitionColumns is enough given tha Done -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Misty Stanley-Jones Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#15). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M python/kudu/client.pyx M python/kudu/tests/common.py M python/kudu/tests/test_client.py M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 56 files changed, 219 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/15 -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit
[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#13). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M python/kudu/client.pyx M python/kudu/tests/common.py M python/kudu/tests/test_client.py M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 56 files changed, 219 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/13 -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit
[kudu-CR] Remove default table partitioning
Dan Burkert has posted comments on this change. Change subject: Remove default table partitioning .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/3131/9/java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java File java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java: Line 24 > Nit: don't unroll. Done http://gerrit.cloudera.org:8080/#/c/3131/9/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala: Line 20: import java.util > If you're not changing file contents, could you avoid changing the import o Done -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Misty Stanley-Jones Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#12). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M python/kudu/client.pyx M python/kudu/tests/common.py M python/kudu/tests/test_client.py M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 56 files changed, 218 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/12 -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit
[kudu-CR] Remove default table partitioning
Adar Dembo has posted comments on this change. Change subject: Remove default table partitioning .. Patch Set 11: Python tests are still broken, and it looks like there's an RpcBenchmark failure too? Oh, and I think you missed my two comments from PS9. -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Misty Stanley-Jones Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#11). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M python/kudu/tests/common.py M python/kudu/tests/test_client.py M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 56 files changed, 223 insertions(+), 141 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/11 -- To view, visit http://gerrit.cloudera.org:8080/3131
[kudu-CR] Remove default table partitioning
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3131 to look at the new patch set (#10). Change subject: Remove default table partitioning .. Remove default table partitioning This commit removes the current default of creating tables with range partitioning over the primary key columns with no splits. This default is problematic because it results in a single tablet, which is a known anti-pattern. Kudu can't predict appropriate split rows without knowledge of the dataset, so creating default splits is not technically feasible. A better default to range partitioning would be to hash partition on the primary key columns with a number of buckets based on the number of tablet servers. Unfortunately, it's similarly difficult to predict an appopriate number of hash buckets with knowledge of the data set. Since changing the default would be a breaking change, and we don't currently have a bullet-proof default option, this commit changes the table creator in the C++ and Java clients to force users to explicitly specify at least range or hash partitioning. Users who really do want a table with no partitioning (a single tablet), can still explicitly set the range partition columns to an empty list and provide no split rows. Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 --- M docs/release_notes.adoc M docs/schema_design.adoc M java/kudu-client-tools/src/main/java/org/kududb/mapreduce/tools/IntegrationTestBigLinkedList.java M java/kudu-client-tools/src/test/java/org/kududb/mapreduce/tools/ITImportCsv.java M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/kududb/client/CreateTableOptions.java M java/kudu-client/src/main/java/org/kududb/client/KuduClient.java M java/kudu-client/src/test/java/org/kududb/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestAsyncKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestFlexiblePartitioning.java M java/kudu-client/src/test/java/org/kududb/client/TestHybridTime.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java M java/kudu-client/src/test/java/org/kududb/client/TestKuduTable.java M java/kudu-client/src/test/java/org/kududb/client/TestLeaderFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestMasterFailover.java M java/kudu-client/src/test/java/org/kududb/client/TestRowErrors.java M java/kudu-client/src/test/java/org/kududb/client/TestRowResult.java M java/kudu-client/src/test/java/org/kududb/client/TestScanPredicate.java M java/kudu-client/src/test/java/org/kududb/client/TestScannerMultiTablet.java M java/kudu-client/src/test/java/org/kududb/client/TestStatistics.java M java/kudu-client/src/test/java/org/kududb/client/TestTimeouts.java M java/kudu-flume-sink/src/test/java/org/kududb/flume/sink/KuduSinkTest.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableInputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITKuduTableOutputFormat.java M java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/ITOutputFormatJob.java M java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/TestContext.scala M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/client/client-test.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/predicate-test.cc M src/kudu/client/samples/sample.cc M src/kudu/integration-tests/all_types-itest.cc M src/kudu/integration-tests/alter_table-randomized-test.cc M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/create-table-itest.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/delete_table-test.cc M src/kudu/integration-tests/full_stack-insert-scan-test.cc M src/kudu/integration-tests/fuzz-itest.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/master_failover-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/remote_bootstrap-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/ts_itest-base.h M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/integration-tests/update_scan_delta_compact-test.cc M src/kudu/integration-tests/write_throttling-itest.cc M src/kudu/tools/ksck_remote-test.cc 54 files changed, 216 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/3131/10 -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
[kudu-CR] Remove default table partitioning
Adar Dembo has posted comments on this change. Change subject: Remove default table partitioning .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/3131/9/java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java File java/kudu-client/src/test/java/org/kududb/client/TestKuduSession.java: Line 24 Nit: don't unroll. http://gerrit.cloudera.org:8080/#/c/3131/9/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala: Line 20: import java.util If you're not changing file contents, could you avoid changing the import order? -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Misty Stanley-Jones Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Remove default table partitioning
Dan Burkert has posted comments on this change. Change subject: Remove default table partitioning .. Patch Set 9: Looks like only the Python tests are still failing now. -- To view, visit http://gerrit.cloudera.org:8080/3131 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7021d7950f8dbb4918503ea6fab2e6ee35076064 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Misty Stanley-Jones Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No