[kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-18 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10406 )

Change subject: KUDU-2437 Generate ScanToken from small chunks in tablet
..


Patch Set 4:

Thank Todd for the question pointed out, we will modify the patch according to 
your design.


--
To view, visit http://gerrit.cloudera.org:8080/10406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
Gerrit-Change-Number: 10406
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Yao Xu 
Gerrit-Comment-Date: Fri, 18 May 2018 06:39:17 +
Gerrit-HasComments: No


Re: [kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-18 Thread 徐瑶
Hi Todd,

Thank you for the question pointed out, we will modify the patch according
to your design.

Yao

2018-05-18 1:13 GMT+08:00 Todd Lipcon (Code Review) :

> Todd Lipcon *posted comments* on this change.
>
> View Change 
>
> Patch set 4:
>
> Hi Xu. Thanks for working on this! It will be really great to decouple the
> available parallelism from the number of partitions.
>
> I have a couple concerns about the design in this patch, though.
>
> 1) periodic reporting of the bounds to the master: for our eventual
> scalability goals of thousands of nodes, each with thousands of tablets, I
> think this may be too much information to try to keep in memory on the
> master and to report on a periodic basis. Especially considering that many
> tablets may be cold (bounds not changing), it's a lot of redundant
> processing. Additionally, with it kept only in memory (not persisted) this
> means that shortly following a restart or failover, the master will not
> have the chunk bounds populated and the query plans submitted by users may
> change dramatically.
>
> 2) including the bounds in every GetTabletLocations call is also somewhat
> expensive from a size/performance perspective. In the case of writes or
> scans, they aren't needed. It would be better to avoid the expense for
> these common cases and only worry about the bounds for the specific use
> case you are targeting (generating splits for tasks)
>
>
> 3) from a security perspective, I'm not sure it's safe to include the
> bounds information in the 'getTabletLocations' RPC response. When we start
> adding more fine-grained authorization it's likely that we will want to
> provide location information to administrators/tools (eg to facilitate
> balancing or other ops workflows) but those administrators may not have
> access to read the underlying data. The bounds themselves can likely expose
> user data (eg imagine that the primary key is an email address or
> government ID number that should not be exposed)
>
>
> I'd like to consider an alternate design in which we add a new RPC to the
> tablet server itself, something like:
>
> message SplitKeyRangeRequest {
>   bytes tablet_id
>   bytes start_pk
>   bytes end_pk
>
>   // Number of bytes to try to return in each chunk. This is a hint.
>   // The tablet server may return chunks larger or smaller than this value.
>   int64 target_chunk_size_bytes
>
>   // The columns to consider when chunking.
>   // If specified, then the size estimate used for 'target_chunk_size_bytes'
>   // should only include these columns. This can be used if a query will
>   // only scan a certain subset of the columns.
>   repeated int32 col_ids
> }
>
> (perhaps the 'col_ids' feature could be left out initially)
>
> message SplitKeyRangeResponse {
>   repeated bytes split_pks
>   repeated int64 size_estimates
> }
>
> The advantages I see for this design are:
>
>- no need to centralize the split metadata (so we can be more
>fine-grained without worrying about master load or memory)
>- for the purposes of locality scheduling, we only need the partition
>locations, so the "planner" (spark driver or impala front-end) doesn't care
>about the sub-partition splits. The individual tasks can then contact their
>local tserver to further sub-divide the range
>- on-demand computation of the splits means we will be able to do a
>better job of lazily opening cold tablets later.
>
> What do you think?
>
>
> To view, visit change 10406 . To
> unsubscribe, visit settings .
> Gerrit-Project: kudu
> Gerrit-Branch: master
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
> Gerrit-Change-Number: 10406
> Gerrit-PatchSet: 4
> Gerrit-Owner: Xu Yao 
> Gerrit-Reviewer: Kudu Jenkins
> Gerrit-Reviewer: Todd Lipcon 
> Gerrit-Comment-Date: Thu, 17 May 2018 17:13:07 +
> Gerrit-HasComments: No
>


[kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-17 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10406 )

Change subject: KUDU-2437 Generate ScanToken from small chunks in tablet
..


Patch Set 4:

Hi Xu. Thanks for working on this! It will be really great to decouple the 
available parallelism from the number of partitions.

I have a couple concerns about the design in this patch, though.

1) periodic reporting of the bounds to the master: for our eventual scalability 
goals of thousands of nodes, each with thousands of tablets, I think this may 
be too much information to try to keep in memory on the master and to report on 
a periodic basis. Especially considering that many tablets may be cold (bounds 
not changing), it's a lot of redundant processing. Additionally, with it kept 
only in memory (not persisted) this means that shortly following a restart or 
failover, the master will not have the chunk bounds populated and the query 
plans submitted by users may change dramatically.

2) including the bounds in every GetTabletLocations call is also somewhat 
expensive from a size/performance perspective. In the case of writes or scans, 
they aren't needed. It would be better to avoid the expense for these common 
cases and only worry about the bounds for the specific use case you are 
targeting (generating splits for tasks)


3) from a security perspective, I'm not sure it's safe to include the bounds 
information in the 'getTabletLocations' RPC response. When we start adding more 
fine-grained authorization it's likely that we will want to provide location 
information to administrators/tools (eg to facilitate balancing or other ops 
workflows) but those administrators may not have access to read the underlying 
data. The bounds themselves can likely expose user data (eg imagine that the 
primary key is an email address or government ID number that should not be 
exposed)


I'd like to consider an alternate design in which we add a new RPC to the 
tablet server itself, something like:

message SplitKeyRangeRequest {
  bytes tablet_id
  bytes start_pk
  bytes end_pk

  // Number of bytes to try to return in each chunk. This is a hint.
  // The tablet server may return chunks larger or smaller than this value.
  int64 target_chunk_size_bytes

  // The columns to consider when chunking.
  // If specified, then the size estimate used for 'target_chunk_size_bytes'
  // should only include these columns. This can be used if a query will
  // only scan a certain subset of the columns.
  repeated int32 col_ids
}

(perhaps the 'col_ids' feature could be left out initially)

message SplitKeyRangeResponse {
  repeated bytes split_pks
  repeated int64 size_estimates
}


The advantages I see for this design are:
- no need to centralize the split metadata (so we can be more fine-grained 
without worrying about master load or memory)
- for the purposes of locality scheduling, we only need the partition 
locations, so the "planner" (spark driver or impala front-end) doesn't care 
about the sub-partition splits. The individual tasks can then contact their 
local tserver to further sub-divide the range
- on-demand computation of the splits means we will be able to do a better job 
of lazily opening cold tablets later.

What do you think?


--
To view, visit http://gerrit.cloudera.org:8080/10406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
Gerrit-Change-Number: 10406
Gerrit-PatchSet: 4
Gerrit-Owner: Xu Yao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 17 May 2018 17:13:07 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-15 Thread Xu Yao (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10406

to look at the new patch set (#4).

Change subject: KUDU-2437 Generate ScanToken from small chunks in tablet
..

KUDU-2437 Generate ScanToken from small chunks in tablet

When reading data in a kudu table using spark, if there is a
large amount of data in the tablet, reading the data takes a long time.

The reason is that KuduRDD uses a tablet to generate
the scanToken, so a spark task needs to process all the data in a tablet. So:

1. TS report the DRS bounds info to Master
2. Client get the bounds info from Master
3. Client generate the scanToken by bounds info of tablet(set 
LowerBoundPrimaryKey and UpperBoundPrimaryKey)

Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/LocatedTablet.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 310 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/10406/4
--
To view, visit http://gerrit.cloudera.org:8080/10406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
Gerrit-Change-Number: 10406
Gerrit-PatchSet: 4
Gerrit-Owner: Xu Yao 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-15 Thread Xu Yao (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10406

to look at the new patch set (#3).

Change subject: KUDU-2437 Generate ScanToken from small chunks in tablet
..

KUDU-2437 Generate ScanToken from small chunks in tablet

When reading data in a kudu table using spark, if there is a
large amount of data in the tablet, reading the data takes a long time.

The reason is that KuduRDD uses a tablet to generate
the scanToken, so a spark task needs to process all the data in a tablet. So:

1. Split the Tablet into many small chunks by some primary keys
2. Report the primary keys to Master
3. Client get the primary keys from Master, and generate the scanToken

Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/LocatedTablet.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 309 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/10406/3
--
To view, visit http://gerrit.cloudera.org:8080/10406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
Gerrit-Change-Number: 10406
Gerrit-PatchSet: 3
Gerrit-Owner: Xu Yao 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-15 Thread Xu Yao (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10406

to look at the new patch set (#2).

Change subject: KUDU-2437 Generate ScanToken from small chunks in tablet
..

KUDU-2437 Generate ScanToken from small chunks in tablet

When reading data in a kudu table using spark, if there is a
large amount of data in the tablet, reading the data takes a long time.

The reason is that KuduRDD uses a tablet to generate
the scanToken, so a spark task needs to process all the data in a tablet. So:

1. Split the Tablet into many small chunks by some primary keys
2. Report the primary keys to Master
3. Client get the primary keys from Master, and generate the scanToken

Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/LocatedTablet.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 306 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/10406/2
--
To view, visit http://gerrit.cloudera.org:8080/10406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
Gerrit-Change-Number: 10406
Gerrit-PatchSet: 2
Gerrit-Owner: Xu Yao 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-2437 Generate ScanToken from small chunks in tablet

2018-05-15 Thread Xu Yao (Code Review)
Xu Yao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10406


Change subject: KUDU-2437 Generate ScanToken from small chunks in tablet
..

KUDU-2437 Generate ScanToken from small chunks in tablet

When reading data in a kudu table using spark, if there is a
large amount of data in the tablet, reading the data takes a long time.

The reason is that KuduRDD uses a tablet to generate
the scanToken, so a spark task needs to process all the data in a tablet. So:

1. Split the Tablet into many small chunks by some primary keys
2. Report the primary keys to Master
3. Client get the primary keys from Master, and generate the scanToken

Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/LocatedTablet.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 305 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/10406/1
--
To view, visit http://gerrit.cloudera.org:8080/10406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ec4395919f4b54102e458ef5154334c08412e8a
Gerrit-Change-Number: 10406
Gerrit-PatchSet: 1
Gerrit-Owner: Xu Yao