Jordan Birdsell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 )
Change subject: [python] : Add Pandas Support to Scanner ...................................................................... Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas > should we move our dependencies to the setup.py and have pandas as a test o see response to grant ^^ http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas > I think it would be more intuitive/maintainable to build this into the pyth Since pandas will be an optional package (even in tests) we cant really add it to any requirements (in a file or in setup.py). This is the pattern that i've seen done for other projects, this line here is merely just to enforce our testing of the pandas tests. http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx File python/kudu/client.pyx: http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@121 PS4, Line 121: are converted incorrectly by Pandas. > converted incorrectly by whom? Done http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1976 PS4, Line 1976: This method call is only needed if you want to explicitly release the > nit: add comma before "as" Done http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1991 PS4, Line 1991: ---------- > this will require _twice_ the total dataset size inn memory because the "al gp, handled this by creating a generator method so we can create the final dataframe by iterating through the batches but only storing them into memory once. http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2001 PS4, Line 2001: > seems strange to set the type dynamically based on the dataset contents. Is yea, so pandas is weird about nulls and not all data types can have null values stored i believe. For what its worth, i took this approach from pyspark. http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2004 PS4, Line 2004: : # Here we are using list comprehension with the > is this just a metadata change or does it actually change the data? This is actually casting the data according to the docs. -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 5 Gerrit-Owner: Jordan Birdsell <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Jordan Birdsell <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 26 Jun 2018 03:53:25 +0000 Gerrit-HasComments: Yes
