Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 )
Change subject: [python] : Add Pandas Support to Scanner ...................................................................... Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx File python/kudu/client.pyx: http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@121 PS4, Line 121: are converted incorrectly. converted incorrectly by whom? http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1976 PS4, Line 1976: to be small as Pandas will load the entire contents into memory. nit: add comma before "as" http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1991 PS4, Line 1991: df = pd.DataFrame.from_records(self.read_all_tuples(), this will require _twice_ the total dataset size inn memory because the "all tuples" is instantiated. is it possible to read batch-by-batch and build up a pandas data frame by concatenating smaller ones, perhaps, so that you have a smaller fixed memory overhead? http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2001 PS4, Line 2001: df[column.name].isnull().any() seems strange to set the type dynamically based on the dataset contents. Is this typical for other Pandas integrations? -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Mon, 25 Jun 2018 15:09:08 +0000 Gerrit-HasComments: Yes
