Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10809 )

Change subject: [python] : Add Pandas Support to Scanner
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx
File python/kudu/client.pyx:

http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@121
PS4, Line 121:     are converted incorrectly.
converted incorrectly by whom?


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1976
PS4, Line 1976:         to be small as Pandas will load the entire contents 
into memory.
nit: add comma before "as"


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1991
PS4, Line 1991:         df = pd.DataFrame.from_records(self.read_all_tuples(),
this will require _twice_ the total dataset size inn memory because the "all 
tuples" is instantiated. is it possible to read batch-by-batch and build up a 
pandas data frame by concatenating smaller ones, perhaps, so that you have a 
smaller fixed memory overhead?


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2001
PS4, Line 2001: df[column.name].isnull().any()
seems strange to set the type dynamically based on the dataset contents. Is 
this typical for other Pandas integrations?



--
To view, visit http://gerrit.cloudera.org:8080/10809
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915
Gerrit-Change-Number: 10809
Gerrit-PatchSet: 4
Gerrit-Owner: Jordan Birdsell <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Mon, 25 Jun 2018 15:09:08 +0000
Gerrit-HasComments: Yes

Reply via email to