[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 12: This post is now live: http://kudu.apache.org/2018/09/26/index-skip-scan-optimization-in-kudu.html -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 12 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 26 Sep 2018 17:57:21 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-09-25-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Reviewed-on: http://gerrit.cloudera.org:8080/11263 Reviewed-by: Mike Percy Tested-by: Mike Percy --- A _posts/2018-09-26-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 114 insertions(+), 0 deletions(-) Approvals: Mike Percy: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: merged Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 12 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 11: Verified+1 Code-Review+2 Rendered locally with site_tool jekyll serve and it looks good. I'm about to push this live. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 11 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 26 Sep 2018 17:50:21 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has uploaded a new patch set (#11) to the change originally created by Anupama Gupta. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-09-25-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-09-26-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 114 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/11 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 11 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 10: I noticed yesterday afternoon that I got distracted and didn't post this. I'm going to wrangle up a new filename and push it out right now. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Wed, 26 Sep 2018 17:25:57 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 10: Verified+1 awesome, thanks Anupama -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 25 Sep 2018 08:01:02 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 10: I'm planning on pushing this out and tweeting it out from @ApacheKudu tomorrow morning (9/25) California time. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 23:32:29 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 23:31:40 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, Mike Percy, Attila Bukor, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#10). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-09-25-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-09-25-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 114 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/10 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 9: Sorry for the confusion. I have renamed the file name in the github version to make it consistent with the last updated change. The updated file name is '2018-09-25-index-skip-scan-optimization-in-kudu.md' -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 21:40:25 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 9: > Patch Set 9: > > > Patch Set 9: Verified-1 > > > > we should probably change the date in the filename as it would be below > > Mac's pipeline post otherwise, so I'm setting verified to -1 for now. > > Attila, I don't understand what you're saying here. Can you clarify? oh sorry. I meant that this post is dated 2018-08-17 and we already have a published one from 2018-09-11, meaning this post wouldn't go to the top, but to the second place instead. The file should simply be renamed to 2018-09-24-index-skip-scan-optimization-in-kudu.md to make sure it goes to the top. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 21:23:53 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 9: > Patch Set 9: Verified-1 > > we should probably change the date in the filename as it would be below Mac's > pipeline post otherwise, so I'm setting verified to -1 for now. Attila, I don't understand what you're saying here. Can you clarify? -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 20:53:13 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 9: Verified-1 we should probably change the date in the filename as it would be below Mac's pipeline post otherwise, so I'm setting verified to -1 for now. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 20:34:23 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 18:54:33 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 9: Code-Review+2 Thanks! I'll work on getting this posted. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 24 Sep 2018 16:58:11 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@10 PS8, Line 10: > nit: I would add something along these lines here to help transition to the Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@36 PS8, Line 36: contains > nit: "only contains the" Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@74 PS8, Line 74: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D > that would work too, yeah Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@76 PS8, Line 76: decide > s/decide/have tentatively chosen/ Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@77 PS8, Line 77: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D > maybe, simple text representation would fit as well: Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@78 PS8, Line 78: take > s/take/project/ Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@88 PS8, Line 88: current implementation > s/current implementation/implementation in the patch/ Done http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@109 PS8, Line 109: [[2]](https://oracle-base.com/articles/9i/index-skip-scanning/): Index Skip Scanning - Oracle Database : : [[3]](https://www.sqlite.org/optoverview.html#skipscan): Skip Scan - SQLite > I see it now; feel free to ignore this. Done -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Sun, 23 Sep 2018 14:25:00 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, Mike Percy, Attila Bukor, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#9). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-09-25-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 114 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/9 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 9 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@74 PS8, Line 74: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D > As an alternative approach, consider simple text representation that would work too, yeah -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 18 Sep 2018 23:34:49 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@109 PS8, Line 109: [[2]](https://oracle-base.com/articles/9i/index-skip-scanning/): Index Skip Scanning - Oracle Database : : [[3]](https://www.sqlite.org/optoverview.html#skipscan): Skip Scan - SQLite > I'm not seeing references to these links in this blog post I see it now; feel free to ignore this. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 18 Sep 2018 23:34:27 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@74 PS8, Line 74: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D > download this, check it in, and include it in the gerrit review please As an alternative approach, consider simple text representation sqrt(number_of_rows_in_tablet) Would fit as well? http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@77 PS8, Line 77: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D > please check this in maybe, simple text representation would fit as well: sqrt(number_of_rows_in_tablet) ? -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 18 Sep 2018 22:58:04 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: (8 comments) Sorry for the delay in reviewing this. This looks good, I just have a few additional nitpick points of feedback and then I think this is ready to post. Since we are getting ready to post this, we should change the date of the blog post to a date in the future. How about 2018-09-25 (next Tuesday), when we can post this? http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@10 PS8, Line 10: nit: I would add something along these lines here to help transition to the next paragraph: I wanted to share my experience and the progress we've made so far on the approach. http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@36 PS8, Line 36: contains nit: "only contains the" http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@74 PS8, Line 74: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D download this, check it in, and include it in the gerrit review please http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@76 PS8, Line 76: decide s/decide/have tentatively chosen/ http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@77 PS8, Line 77: http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D please check this in http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@78 PS8, Line 78: take s/take/project/ http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@88 PS8, Line 88: current implementation s/current implementation/implementation in the patch/ http://gerrit.cloudera.org:8080/#/c/11263/8/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@109 PS8, Line 109: [[2]](https://oracle-base.com/articles/9i/index-skip-scanning/): Index Skip Scanning - Oracle Database : : [[3]](https://www.sqlite.org/optoverview.html#skipscan): Skip Scan - SQLite I'm not seeing references to these links in this blog post -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 18 Sep 2018 22:28:58 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: Code-Review+2 Looks good! Thank you for the post! -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 17 Sep 2018 17:54:20 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, Mike Percy, Attila Bukor, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#8). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 113 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/8 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@7 PS7, Line 7: team > nit: lower-case "team" Done http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@27 PS7, Line 27: table > nit: lower-case "table" Done http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45 PS7, Line 45: . We will refer to it as the : "prefix column" and its specific value as the "prefix k > nit: drop the parens and start a new sentence instead. Done http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@47 PS7, Line 47: > nit: remove comma, maybe replace with "that" Done http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@82 PS7, Line 82: : Conclusion : == > No where does this mention that, as implemented, this works for equality pr Done http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@98 PS7, Line 98: roughly enjo > nit: full-fledged Done http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@99 PS7, Line 99: right from underst > nit: "of the skip scan approach" or "of the skip scan optimization" Done -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 8 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 13 Sep 2018 22:16:44 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 7: Code-Review+1 (7 comments) Some tiny nits and one real suggestion, but otherwise LGTM. http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@7 PS7, Line 7: Team nit: lower-case "team" http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@27 PS7, Line 27: Table nit: lower-case "table" http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45 PS7, Line 45: (we will refer to it as : "prefix column" and its specific value as "prefix key") nit: drop the parens and start a new sentence instead. Also nit: "as the 'prefix column' and its specific value as the 'prefix key.'" http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@47 PS7, Line 47: , nit: remove comma, maybe replace with "that" http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@82 PS7, Line 82: : Conclusion : == No where does this mention that, as implemented, this works for equality predicates. Should probably mention that. http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@98 PS7, Line 98: full fledged nit: full-fledged http://gerrit.cloudera.org:8080/#/c/11263/7/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@99 PS7, Line 99: skip scan approach nit: "of the skip scan approach" or "of the skip scan optimization" -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 7 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 13 Sep 2018 17:57:03 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 7: (13 comments) Many thanks for the comments. Please take a look. http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@39 PS6, Line 39: table > nit: tablet, here and elsewhere Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45 PS6, Line 45: (we will refer to it as : "prefix column" and its specific value as "prefix key"). > nit: since we're not using these as a variable names, but rather as definit Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@48 PS6, Line 48: Therefore, we can use the index to skip to the rows that have distinct prefix keys, : and also satisfy the predicate on the `tstamp` column. > nit: maybe drop the ** around "skip" here, since you do it down below anywa Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@58 PS6, Line 58: `host = helium` > nit: would be nice if the entire thing were in backticks, since it's a cond You are right. Made the change. http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@60 PS6, Line 60: satisfy the predicate, and we > nit: probably not needed Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@59 PS6, Line 59: . At that : point we would > nit: reword "until the predicate no longer matches. At that point we would Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@67 PS6, Line 67: tio > nit: this is a little distracting, below too. Let's just keep it singular s Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73 PS6, Line 73: o get worse with respect to the full tablet scan performance when the prefix column cardinality > This and below are being rendered weirdly by github. Would like to confirm Yes, it works fine with jekyll. (Link to this screen shot - https://raw.githubusercontent.com/AnupamaGupta01/kudu-1/gh-pages-staging/img/index-skip-scan/equation.png) http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@74 PS6, Line 74: C%20tablet%20%7D). : Therefore, in order to use skip scan perf > "consistent performance in cases of large prefix column cardinality" Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@93 PS6, Line 93: > nit: probably not needed, below too. Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@94 PS6, Line 94: Range pr > nit: In-list Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@98 PS6, Line 98: orki > nit: team Done http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@101 PS6, Line 101: : References : == : : [[1]](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42851.pdf): Gupta, Ashish, et al. "Mesa: : Geo-replicated, near real-time, scalable data warehousing." Proceedings of the VLDB Endowment 7.12 (2014): 1259-1270. : : [[2]](https://oracle-base.com/articles/9i/index-skip-scanning/): Index Skip Scanning - Oracle Database : > It's really up to you, but WDYT about just linking these in-line? This is a Thanks, I see your point. I think that the current section for references looks fine after incorporating Alexey's suggestions on the same (in Patch 4, L62). -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 7 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 13 Sep 2018 03:35:44 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, Mike Percy, Attila Bukor, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#7). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 112 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/7 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 7 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 6: (14 comments) Almost all nits pretty much. Looking good! http://gerrit.cloudera.org:8080/#/c/11263/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11263/6//COMMIT_MSG@9 PS6, Line 9: Link to the version with images: Very nice :) http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@39 PS6, Line 39: table nit: tablet, here and elsewhere http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45 PS6, Line 45: (we will refer to it as : `prefix column` and it's specific value as `prefix key`) nit: since we're not using these as a variable names, but rather as definitions, we should use quotations. Also drop the apostrophe in "its". I.e: we will refer to it as the "prefix column" and its specific value as the "prefix key" http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@48 PS6, Line 48: Therefore, we can use the index to **skip** to the rows that have distinct prefix keys, : and also satisfy the predicate on the `tstamp` column. nit: maybe drop the ** around "skip" here, since you do it down below anyway. http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@58 PS6, Line 58: `host` = helium nit: would be nice if the entire thing were in backticks, since it's a condition? Seems a little awkward, this mix of backticks and no backticks. WDYT? http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@60 PS6, Line 60: such as `ubuntu`, `westeros` nit: probably not needed http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@59 PS6, Line 59: with : this prefix key nit: reword "until the predicate no longer matches. At that point we would know that no more rows with `host = helium` will satisfy the predicate, and we can skip to the next prefix key. http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@67 PS6, Line 67: (s) nit: this is a little distracting, below too. Let's just keep it singular since you call out at the end that it can be any number of prefix columns at the end anyway. http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73 PS6, Line 73: ![](http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D). This and below are being rendered weirdly by github. Would like to confirm this doesn't happen with jekyll http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@74 PS6, Line 74: consistent performance with : respect to the prefix columns cardinality "consistent performance in cases of large prefix column cardinality" http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@93 PS6, Line 93: on the non-first key columns(s) nit: probably not needed, below too. http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@94 PS6, Line 94: IN list nit: In-list http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@98 PS6, Line 98: Team nit: team http://gerrit.cloudera.org:8080/#/c/11263/6/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@101 PS6, Line 101: References : == : : [[1]](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42851.pdf): Gupta, Ashish, et al. "Mesa: : Geo-replicated, near real-time, scalable data warehousing." Proceedings of the VLDB Endowment 7.12 (2014): 1259-1270. : : [[2]](https://oracle-base.com/articles/9i/index-skip-scanning/): Index Skip Scanning - Oracle Database : : [[3]](https://www.sqlite.org/optoverview.html#skipscan): Skip Scan - SQLite It's really up to you, but WDYT about just linking these in-line? This is a webpage after all :) -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 6 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer:
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG@6 PS4, Line 6: : Blogpost describing index skip scan optimization. : > Thanks for this Andrew. I am still not sure why images are not getting rend Resolved this issue now. Added a link to the rendered version. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 6 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Sun, 09 Sep 2018 20:13:26 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 6: (17 comments) http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73 PS4, Line 73: exceeds ![](http://latex.codecogs.com/gif.download?%5Csqrt%20%7B%20%5C%23rows%5C%20in%5C%20tablet%20%7D). : Therefore, in order to use skip scan performance benefits when possible and maintain a consistent performance with > I think it's the number of rows in the CFileSet, which I think is also the You are right ! How about rewording this to "rows in tablet" ? http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@9 PS5, Line 9: index skip scan (a.k. > It's great that you found another reference to the same idea in the google' Sounds good. Done. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@13 PS5, Line 13: Let's b > nit: probably don't need this Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@40 PS5, Line 40: an option > nit: probably don't need this Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@38 PS5, Line 38: : Instead, a full table scan is done by default. Other databases may optimize such scans by building secondary indexes : (though it might be redundant to build one on one of the > Let's stick with a single concrete example, say `tstamp`. Then we can point Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@41 PS5, Line 41: given its lack of secondary index support. : : The question is, can Kudu do better than a full table scan here? : > nit: I think this would read better after L45. E.g. Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@47 PS5, Line 47: in the in > nit: since this is a concrete example, we know there is only one column bef Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@50 PS5, Line 50: : {% highlight SQL %} > nit: reword as "to **skip** to the rows that have distinct prefix keys, and Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@61 PS5, Line 61: ce, this metho > nit: "Kudu tablet" or "tablet server" or "Kudu" Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@61 PS5, Line 61: as **skip scan optimization**[2-3]. : : Performance > Maybe reverse the order of **skip** and **scan**, since the name is "skip s Done. You are correct, I have rephrased the sentence to better clarify this point. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@70 PS5, Line 70: > nit: add "the" in front of "Lower" and "better" Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@71 PS5, Line 71: nts, on up to 10 million rows per t > I seem to recall a plot that showed the performance without the dynamic dis That's a good point. Unfortunately, I do not have the backup of that slide. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@77 PS5, Line 77: > nit: skips? for consistency with the "skip" and "scan" terminology Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@77 PS5, Line 77: It will be an in > nit: I think it's clear enough that this may refer to multiple, so maybe ju Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@89 PS5, Line 89: > nit: one (`host`) Done http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@104 PS5, Line 104: [[1 > Do you feel good about adding one more reference? I think https://www.sqli Thanks so much for this suggestion. I have added this reference too. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@105 PS5, Line 105: Geo- > nit: usually in the reference section they use '[x]' where it's possible to Thank you for pointing this out. Done. -- To view, visit
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, Mike Percy, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#6). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Link to the version with images: https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 111 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/6 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 6 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 5: (4 comments) Great progress! Some more nits in addition to what Andrew already pointed at. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73 PS4, Line 73: : Based on our experiments, on up to 10 million rows per tablet (as shown below), we found that the skip scan performa > 1) Yes, these experiments were based on table schema and query pattern ment I see. Thank you for the information. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@9 PS5, Line 9: [index skip scan][1]. It's great that you found another reference to the same idea in the google's paper [2]. Do you think it's worth mentioning the other name for the technique? Something like 'index skip scan (a.k.a. scan-to-seek, see section 4.1 in [2]). http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@104 PS5, Line 104: [1] Do you feel good about adding one more reference? I think https://www.sqlite.org/optoverview.html#skipscan is also a good one. But it's up to you. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@105 PS5, Line 105: [2]: nit: usually in the reference section they use '[x]' where it's possible to follow the link simply by clicking on it. To enable those square brackets to appear in the rendered output, you need to duplicate them, e.g. [[1]](https://my.url.io/) Mega-turbo resource Or you want them to be just numbers followed by columns? -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 04 Sep 2018 22:33:55 + Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 5: > Patch Set 5: > > (14 comments) > > Hrm, I'm not sure why it's not rendering on github for you. Maybe post a > screenshot of the rendered jekyll? That'd be helpful too. P.S. thanks for updating this! Looking much better so far :) -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Anupama Gupta Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Tue, 04 Sep 2018 19:43:14 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 5: (14 comments) Hrm, I'm not sure why it's not rendering on github for you. Maybe post a screenshot of the rendered jekyll? That'd be helpful too. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73 PS4, Line 73: : Based on our experiments, on up to 10 million rows per tablet (as shown below), we found that the skip scan performa > Added explanation about how we came to using this simple heuristic. Yes, it I think it's the number of rows in the CFileSet, which I think is also the number of rows in the b-tree, but it isn't equal to the number of rows in the table (since that spans multiple tablets). http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@13 PS5, Line 13: Example nit: probably don't need this http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@38 PS5, Line 38: first key column(s) : (`tstamp` and/or `clusterid`)? In this case, since the column value might be present anywhere in the index structure, : the current query execution plan does not use the index. Let's stick with a single concrete example, say `tstamp`. Then we can point to the example above: "In the above case, the `tsamp` columns are sorted with respect to `host`, but are not globally sorted, and as such, it's non-trivial to use the index to filter rows. http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@40 PS5, Line 40: by default nit: probably don't need this http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@41 PS5, Line 41: To optimize this scan time, a possible solution is to build secondary index on the required key column (although, it might be : redundant to build secondary index on composite key column). : However, we do not consider this solution as Kudu does not support secondary indexes yet. : nit: I think this would read better after L45. E.g. Other databases may optimize such scans by build secondary indexes (though it might be redundant to build one on one of the primary keys). However, this isn't an option for Kudu, given its lack of secondary index support. The question is, can Kudu do better than a full table scan here? http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@47 PS5, Line 47: column(s) nit: since this is a concrete example, we know there is only one column before `tsamp` http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@50 PS5, Line 50: seek to the rows containing distinct prefix keys : and satisfying the query predicate on the `tstamp` column. nit: reword as "to **skip** to the rows that have distinct prefix keys, and also satisfy the predicate on the `tsamp` column." http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@61 PS5, Line 61: query server nit: "Kudu tablet" or "tablet server" or "Kudu" http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@61 PS5, Line 61: **scan** all rows for which `host` = `helium` and `tstamp` = 100 and consequently, : **skip** all the rows for which host = `helium` and `tstamp` != 100 : (holds true for all distinct keys of `host` such as `ubuntu`, `westeros`). Maybe reverse the order of **skip** and **scan**, since the name is "skip scan"? Also isn't the actual order is to skip to a distinct prefix that may match a predicate, and then scan through rows until we know that the rows won't match the predicate within this prefix key? http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@70 PS5, Line 70: Lower the prefix column cardinality, better the skip scan performance nit: add "the" in front of "Lower" and "better" http://gerrit.cloudera.org:8080/#/c/11263/5/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@71 PS5, Line 71: skip scan is not a viable approach. I seem to recall a plot that showed the performance without the dynamic disabling functionality. Do you still have that around? I think that would be interesting to put up since it exemplifies this
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 5: (22 comments) Please take a look. http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG@6 PS4, Line 6: : Blogpost describing index skip scan optimization. : > In reviewing blogposts, it's generally helpful to post a link to a rendered Thanks for this Andrew. I am still not sure why images are not getting rendered here -https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md. Although I do see the rendered version locally, using jekyll. Please let me know if I am missing something. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@9 PS4, Line 9: [index skip scan][1]. > This already seems like it's going a bit too far into implementation detail Got it. Done. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@11 PS4, Line 11: : > Probably don't need this. Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@14 PS4, Line 14: > What do these do? This is used to show the beginning excerpts of the post. I misunderstood earlier that it is used for newline. Moved this tag after the beginning two lines and removed this from elsewhere. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@31 PS4, Line 31: > Maybe, add a reference (like https://en.wikipedia.org/wiki/B-tree) in-line Makes sense. Added an in-line reference. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@31 PS4, Line 31: > nit: "a B-tree", and no need to capitalize "Tree" below Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@33 PS4, Line 33: `metric > nit: perhaps using ``s would be more reasonable here (ie. `host`). Then it' Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@32 PS4, Line 32: In this case, by default, Kudu internally builds a primary key index (implemented as a : [B-tree](https://en.wikipedia.org/wiki/B-tree)) for the table `metrics`. : As shown in the table above, the ind > IMO this doesn't convey the idea that the data is sorted by the composite o Thanks for this suggestion. I moved the example dataset from below and used it as a reference to elaborate on the points you mentioned. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@32 PS4, Line 32: In this case, by default, Kudu internally builds a primary key index (implemented as a : [B-tree](https://en.wikipedia.org/wiki/B-tree)) for the table `metrics`. : As shown in the table above, the ind > +1 all points mentioned by Andrew here. Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@39 PS4, Line 39: ? > nit: here and elsewhere, no need for spaces before punctuation marks Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@40 PS4, Line 40: the current query execution plan does not use the index. Instead, a full tab > I'm not sure this gives a clear explanation as for the reason to perform a Rephrased this paragraph to clarify this point. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45 PS4, Line 45: The question is, > In general, I think the index skip scan optimization is not the only answer Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@46 PS4, Line 46: > The crux of this is the prefixes are also sorted, and all rows of a given p Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@53 PS4, Line 53: For example, consider the query: > nit: maybe, to be in sync with the CREATE TABLE statement above, write SQL Done http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@51 PS4, Line 51: and satisfying the query predicate on the `tstamp` column. : : For example, consider the query: : {% highlight SQL %} : SELECT clusterid FROM metrics WHERE tstamp = 100; : {% endhighlight %} : > Ah, so you _do_ have an example! I
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, Mike Percy, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#5). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/example-table.png A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 4 files changed, 106 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/5 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 5 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Mike Percy
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 4: I like the article. One thing I think we should so is mention that this is a work-in-progress patch and link to the Gerrit review so people can follow along if they want. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 30 Aug 2018 20:12:47 + Gerrit-HasComments: No
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@31 PS4, Line 31: B-Tree Maybe, add a reference (like https://en.wikipedia.org/wiki/B-tree) in-line or in a separate 'References' section? http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@32 PS4, Line 32: The data is sorted lexicographically starting from the leftmost primary key column and stored in the B-Tree leaf nodes. : Therefore, when the user query contains the first key column ("host"), Kudu uses the primary key range push down : operation to optimize the scan time. > IMO this doesn't convey the idea that the data is sorted by the composite o +1 all points mentioned by Andrew here. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@40 PS4, Line 40: (since the primary key index is sorted on the basis of the first key column) I'm not sure this gives a clear explanation as for the reason to perform a full table scan. Could you update this to explain why simply using the primary index we cannot instantly locate the desired rows? http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45 PS4, Line 45: The answer is yes In general, I think the index skip scan optimization is not the only answer. In other databases it's possible to build secondary indices, and that might work even better (of course it depends on the read/write ratio for the use-case and availability of space to build additional index). I think it's worth mentioning that building secondary index would not be the option here since Kudu does not support secondary indices yet. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@53 PS4, Line 53: select clusterid from metrics where tstamp = 100 nit: maybe, to be in sync with the CREATE TABLE statement above, write SQL keywords in capital letters. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@62 PS4, Line 62: popularly known as index skip scan optimization can skip all the rows for which host = "helium" and tstamp != 100 > nit: it's great to get to the point that we call this a "skip scan". To dri Maybe, it's worth mentioning 'skip scan' earlier where you give a short overview of the idea behind the skip scan optimization. Also, as for addressing the 'popularity' of the term, I think that adding some references in a separate section for various databases that implement that optimization might be useful (e.g., one of those links might be https://oracle-base.com/articles/9i/index-skip-scanning). http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73 PS4, Line 73: Based on experiments on upto 10 million rows per tablet, we decided to disable skip scan when the number of seeks : for distinct prefix column values exceeds ![](https://latex.codecogs.com/gif.latex?%5Csqrt%7B%5C%23total%20rows%7D). > This could use some explanation as to why sqrt(total_num_rows) was chosen. Yep, it would be nice to add some details around the data and reasoning backing the choice of this disable-skip-scan criterion. 1) As for those experiments, were those using the table schema and query pattern mentioned above? Or those experiments involved some other table schemas and query patterns? 2) What was the rationale at the conceptual level to choose that sqrt() metric? 3) If there were multiple candidate criteria to choose from, maybe it's worth mentioning those as well? 4) If 3 is true, was the sqrt() criteria a clear winner or there was some fuziness and the sqrt() was chosen also because it looks simpler comparing to others? http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@78 PS4, Line 78: The performance graph of this approach is shown below This is for the schema and query pattern mentioned earlier, right? Maybe, it's worth mentioning that. -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Comment-Date: Wed, 29 Aug 2018
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG@6 PS4, Line 6: : Blogpost describing index skip scan optimization. : In reviewing blogposts, it's generally helpful to post a link to a rendered version, e.g. posting to your own github, which will automatically render the *.md http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md: http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@9 PS4, Line 9: does not contain the first column of the composite (multi-column) primary key. This already seems like it's going a bit too far into implementation details. Maybe instead note something like: 'I optimized the Kudu scan-path by implementing a technique called an "index-skip scan."' http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@11 PS4, Line 11: Example : == Probably don't need this. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@14 PS4, Line 14: What do these do? http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@31 PS4, Line 31: as B-Tree nit: "a B-tree", and no need to capitalize "Tree" below http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@33 PS4, Line 33: ("host") nit: perhaps using ``s would be more reasonable here (ie. `host`). Then it'd be formatted as monospace font. Here and elsewhere http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@32 PS4, Line 32: The data is sorted lexicographically starting from the leftmost primary key column and stored in the B-Tree leaf nodes. : Therefore, when the user query contains the first key column ("host"), Kudu uses the primary key range push down : operation to optimize the scan time. IMO this doesn't convey the idea that the data is sorted by the composite of all primary key columns. Also not sure what you mean by "primary key range push down operation". Also, overall for this project, I think it's always been helpful to think/reason about it with some example data. I think having an dummy dataset of a handful of rows with a decent number of prefix keys would make this blogpost more understandable to the layperson. It'd also serve as a concrete example of why we can't use the PK index if the predicate doesn't contain the first key. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@39 PS4, Line 39: nit: here and elsewhere, no need for spaces before punctuation marks http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@46 PS4, Line 46: form a prefix. The crux of this is the prefixes are also sorted, and all rows of a given prefix are also sorted by the remaining PK columns. A prefix with no other properties isn't necessarily useful, so without calling that out, it might be hard to see why having these prefixes are helpful. http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@51 PS4, Line 51: For example, consider the query : : {% highlight SQL %} : select clusterid from metrics where tstamp = 100; : {% endhighlight %} : : ![png]({{ site.github.url }}/img/index-skip-scan/skip-scan-example-table.png){:height="500px" width="500px" .img-responsive} : *Sample rows of Table "metrics" (sorted by key columns for simplicity).* Ah, so you _do_ have an example! I think it'd be helpful setting this up up front, saying, here is how data is organized in Kudu today, and based on that, why it's not straightforward to use the index when there aren't predicates on the first primary key, etc. Also isn't the data _actually_ stored like this? I.e. not for simplicity, but this actually represents how Kudu would see the data, doesn't it? http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@61 PS4, Line 61: host = "helium" nit: backticks here too and everywhere else that has code snippets http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@62 PS4, Line 62: popularly known as index skip scan optimization can skip all the rows for which host = "helium" and tstamp != 100 nit: it's great to get to the point that we
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Hello Alexey Serbin, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11263 to look at the new patch set (#4). Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 3 files changed, 95 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/4 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 4 Gerrit-Owner: Anupama Gupta Gerrit-Reviewer: Alexey Serbin
[kudu-CR](gh-pages) Blogpost describing index skip scan optimization.
Anupama Gupta has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/11263 ) Change subject: Blogpost describing index skip scan optimization. .. Blogpost describing index skip scan optimization. Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e --- A _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md A img/index-skip-scan/skip-scan-example-table.png A img/index-skip-scan/skip-scan-performance-graph.png 3 files changed, 64 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/11263/3 -- To view, visit http://gerrit.cloudera.org:8080/11263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e Gerrit-Change-Number: 11263 Gerrit-PatchSet: 3 Gerrit-Owner: Anupama Gupta