[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1677 Merge to master if there're no further discussions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user m30m commented on the issue: https://github.com/apache/zeppelin/pull/1677 I'm not sure whether it's a good idea to hide this complexity in a special way and I should check whether these changes are backward compatible. So I guess a doc-only PR, with a JIRA issue afterwards to handle some spark special types is a better solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1677 well, it's a lot quicker to get doc-only PR in :) besides we should have a JIRA for changes like this. It's your call, @m30m --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1677 If we want to support the feature I mentioned I above in another PR, then the document here is useless because we have to update the doc later. So it would be better to do it in this PR IMHO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1677 Let's keep this as documentation only and let's open a JIRA (another PR) for the DataFrame support? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1677 Yes, and you also need to update method `__getitem__` so that user don't need to construct DataFrame as this. `z.get("myScalaDataFrame")` should return DataFrame directly ``` myScalaDataFrame = DataFrame(z.get("myScalaDataFrame"), sqlContext) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user m30m commented on the issue: https://github.com/apache/zeppelin/pull/1677 Yes, that's a good idea. Shall I add a commit to this branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1677 I mean we can internally do this in `PyZeppelinContext` as following: ``` def __setitem__(self, key, item): if isinstance(item, DataFrame): self.z.put(key, item._jdf) else: self.z.put(key, item) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user m30m commented on the issue: https://github.com/apache/zeppelin/pull/1677 It's not possible to put the DataFrame directly because of this error: ```Exception: Traceback (most recent call last): File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1124, in __call__ args_command, temp_args = self._build_args(*args) File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1094, in _build_args [get_command_part(arg, self.pool) for arg in new_args]) File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 289, in get_command_part command_part = REFERENCE_TYPE + parameter._get_object_id() File "/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 841, in __getattr__ "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) AttributeError: 'DataFrame' object has no attribute '_get_object_id' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1677 Should we do it implicitly for user in `ZeppelinContext`? Because I feel the syntax is not easy to understand if user don't know the internal implementation of pyspark. And I think we should not expose such internal things to users. ``` z.put("myPythonDataFrame", postsDf._jdf) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1677: Add doc for exchanging data frames
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1677 @m30m Awesome! LGTM and merge to master if there're no more comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---