[kudu-CR] [python] : Add Pandas Support to Scanner

Jordan Birdsell (Code Review) Mon, 25 Jun 2018 20:53:53 -0700

Jordan Birdsell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10809 )


Change subject: [python] : Add Pandas Support to Scanner
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454
PS4, Line 454: pip install pandas
> should we move our dependencies to the setup.py and have pandas as a test o
see response to grant ^^


http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454
PS4, Line 454:   pip install pandas
> I think it would be more intuitive/maintainable to build this into the pyth
Since pandas will be an optional package (even in tests) we cant really add it 
to any requirements (in a file or in setup.py).  This is the pattern that i've 
seen done for other projects, this line here is merely just to enforce our 
testing of the pandas tests.


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx
File python/kudu/client.pyx:

http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@121
PS4, Line 121:     are converted incorrectly by Pandas.
> converted incorrectly by whom?
Done


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1976
PS4, Line 1976:         This method call is only needed if you want to 
explicitly release the
> nit: add comma before "as"
Done


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1991
PS4, Line 1991:         ----------
> this will require _twice_ the total dataset size inn memory because the "al
gp, handled this by creating a generator method so we can create the final 
dataframe by iterating through the batches but only storing them into memory 
once.


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2001
PS4, Line 2001:
> seems strange to set the type dynamically based on the dataset contents. Is
yea, so pandas is weird about nulls and not all data types can have null values 
stored i believe. For what its worth, i took this approach from pyspark.


http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2004
PS4, Line 2004:
              :         # Here we are using list comprehension with the
> is this just a metadata change or does it actually change the data?
This is actually casting the data according to the docs.



--
To view, visit http://gerrit.cloudera.org:8080/10809
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915
Gerrit-Change-Number: 10809
Gerrit-PatchSet: 5
Gerrit-Owner: Jordan Birdsell <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Jordan Birdsell <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 26 Jun 2018 03:53:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [python] : Add Pandas Support to Scanner

Reply via email to