Re: Expand the Spark SQL programming guide?
The examples look great indeed. Seems a good addition to the existing documentation. I understand the UDAF examples don't apply to Python but is there any relevant reason to skip Python API altogether from this window functions documentation? On 20 December 2016 at 16:56, Jim Hughes <jn...@ccri.com> wrote: > Hi Anton, > > Your example and documentation looks great! I left some comments > suggesting a few additions, but the PR in its current state is a great > improvement! > > Thanks, > > Jim > > > On 12/18/2016 09:09 AM, Anton Okolnychyi wrote: > > Any comments/suggestions are more than welcome. > > Thanks, > Anton > > 2016-12-18 15:08 GMT+01:00 Anton Okolnychyi <anton.okolnyc...@gmail.com>: > >> Here is the pull request: <https://github.com/apache/spark/pull/16329> >> https://github.com/apache/spark/pull/16329 >> >> >> >> 2016-12-16 20:54 GMT+01:00 Jim Hughes < <jn...@ccri.com>jn...@ccri.com>: >> >>> I'd be happy to review a PR. At the minute, I'm still learning Spark >>> SQL, so writing documentation might be a bit of a stretch, but reviewing >>> would be fine. >>> >>> Thanks! >>> >>> >>> On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote: >>> >>> Yes - that sounds good Anton, I can work on documenting the window >>> functions. >>> >>> >>> >>> *From: *Anton Okolnychyi <anton.okolnyc...@gmail.com> >>> <anton.okolnyc...@gmail.com> <anton.okolnyc...@gmail.com> >>> *Date: *Thursday, December 15, 2016 at 4:34 PM >>> *To: *Conversant <jthak...@conversantmedia.com> >>> <jthak...@conversantmedia.com> <jthak...@conversantmedia.com> >>> *Cc: *Michael Armbrust <mich...@databricks.com><mich...@databricks.com> >>> <mich...@databricks.com>, Jim Hughes <jn...@ccri.com><jn...@ccri.com> >>> <jn...@ccri.com>, "dev@spark.apache.org" <dev@spark.apache.org> >>> <dev@spark.apache.org> <dev@spark.apache.org> >>> *Subject: *Re: Expand the Spark SQL programming guide? >>> >>> >>> >>> I think it will make sense to show a sample implementation of >>> UserDefinedAggregateFunction for DataFrames, and an example of the >>> Aggregator API for typed Datasets. >>> >>> >>> >>> Jim, what if I submit a PR and you join the review process? I also do >>> not mind to split this if you want, but it seems to be an overkill for this >>> part. >>> >>> >>> >>> Jayesh, shall I skip the window functions part since you are going to >>> work on that? >>> >>> >>> >>> 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh < >>> <jthak...@conversantmedia.com>jthak...@conversantmedia.com>: >>> >>> I too am interested in expanding the documentation for Spark SQL. >>> >>> For my work I needed to get some info/examples/guidance on window >>> functions and have been using >>> <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html> >>> https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html >>> . >>> >>> How about divide and conquer? >>> >>> >>> >>> >>> >>> *From: *Michael Armbrust < <mich...@databricks.com> >>> mich...@databricks.com> >>> *Date: *Thursday, December 15, 2016 at 3:21 PM >>> *To: *Jim Hughes < <jn...@ccri.com>jn...@ccri.com> >>> *Cc: *" <dev@spark.apache.org>dev@spark.apache.org" < >>> <dev@spark.apache.org>dev@spark.apache.org> >>> *Subject: *Re: Expand the Spark SQL programming guide? >>> >>> >>> >>> Pull requests would be welcome for any major missing features in the >>> guide: >>> <https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md> >>> https://github.com/apache/spark/blob/master/docs/sql- >>> programming-guide.md >>> >>> >>> >>> On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes < <jn...@ccri.com> >>> jn...@ccri.com> wrote: >>> >>> Hi Anton, >>> >>> I'd like to see this as well. I've been working on implementing >>> geospatial user-defined types and functions. Having examples of >>> aggregations and window functions would be awesome! >>> >>> I did test out implementing a distributed convex hull as a >>> UserDefinedAggregateFunction, and that seemed to work sensibly. >>> >>> Cheers, >>> >>> Jim >>> >>> >>> >>> On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: >>> >>> Hi, >>> >>> >>> >>> I am wondering whether it makes sense to expand the Spark SQL >>> programming guide with examples of aggregations (including user-defined via >>> the Aggregator API) and window functions. For instance, there might be a >>> separate subsection under "Getting Started" for each functionality. >>> >>> >>> >>> SPARK-16046 seems to be related but there is no activity for more than 4 >>> months. >>> >>> >>> >>> Best regards, >>> >>> Anton >>> >>> >>> >>> >>> >>> >>> >>> >>> >> > >
Re: Expand the Spark SQL programming guide?
Hi Anton, Your example and documentation looks great! I left some comments suggesting a few additions, but the PR in its current state is a great improvement! Thanks, Jim On 12/18/2016 09:09 AM, Anton Okolnychyi wrote: Any comments/suggestions are more than welcome. Thanks, Anton 2016-12-18 15:08 GMT+01:00 Anton Okolnychyi <anton.okolnyc...@gmail.com <mailto:anton.okolnyc...@gmail.com>>: Here is the pull request: https://github.com/apache/spark/pull/16329 <https://github.com/apache/spark/pull/16329> 2016-12-16 20:54 GMT+01:00 Jim Hughes <jn...@ccri.com <mailto:jn...@ccri.com>>: I'd be happy to review a PR. At the minute, I'm still learning Spark SQL, so writing documentation might be a bit of a stretch, but reviewing would be fine. Thanks! On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote: Yes - that sounds good Anton, I can work on documenting the window functions. *From: *Anton Okolnychyi <anton.okolnyc...@gmail.com> <mailto:anton.okolnyc...@gmail.com> *Date: *Thursday, December 15, 2016 at 4:34 PM *To: *Conversant <jthak...@conversantmedia.com> <mailto:jthak...@conversantmedia.com> *Cc: *Michael Armbrust <mich...@databricks.com> <mailto:mich...@databricks.com>, Jim Hughes <jn...@ccri.com> <mailto:jn...@ccri.com>, "dev@spark.apache.org" <mailto:dev@spark.apache.org> <dev@spark.apache.org> <mailto:dev@spark.apache.org> *Subject: *Re: Expand the Spark SQL programming guide? I think it will make sense to show a sample implementation of UserDefinedAggregateFunction for DataFrames, and an example of the Aggregator API for typed Datasets. Jim, what if I submit a PR and you join the review process? I also do not mind to split this if you want, but it seems to be an overkill for this part. Jayesh, shall I skip the window functions part since you are going to work on that? 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com <mailto:jthak...@conversantmedia.com>>: I too am interested in expanding the documentation for Spark SQL. For my work I needed to get some info/examples/guidance on window functions and have been using https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html> . How about divide and conquer? *From: *Michael Armbrust <mich...@databricks.com <mailto:mich...@databricks.com>> *Date: *Thursday, December 15, 2016 at 3:21 PM *To: *Jim Hughes <jn...@ccri.com <mailto:jn...@ccri.com>> *Cc: *"dev@spark.apache.org <mailto:dev@spark.apache.org>" <dev@spark.apache.org <mailto:dev@spark.apache.org>> *Subject: *Re: Expand the Spark SQL programming guide? Pull requests would be welcome for any major missing features in the guide: https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md <https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md> On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes <jn...@ccri.com <mailto:jn...@ccri.com>> wrote: Hi Anton, I'd like to see this as well. I've been working on implementing geospatial user-defined types and functions. Having examples of aggregations and window functions would be awesome! I did test out implementing a distributed convex hull as a UserDefinedAggregateFunction, and that seemed to work sensibly. Cheers, Jim On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046 seems to be related but there is no activity for more than 4 months. Best regards, Anton
Re: Expand the Spark SQL programming guide?
Any comments/suggestions are more than welcome. Thanks, Anton 2016-12-18 15:08 GMT+01:00 Anton Okolnychyi <anton.okolnyc...@gmail.com>: > Here is the pull request: https://github.com/apache/spark/pull/16329 > > > > 2016-12-16 20:54 GMT+01:00 Jim Hughes <jn...@ccri.com>: > >> I'd be happy to review a PR. At the minute, I'm still learning Spark >> SQL, so writing documentation might be a bit of a stretch, but reviewing >> would be fine. >> >> Thanks! >> >> >> On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote: >> >> Yes - that sounds good Anton, I can work on documenting the window >> functions. >> >> >> >> *From: *Anton Okolnychyi <anton.okolnyc...@gmail.com> >> <anton.okolnyc...@gmail.com> >> *Date: *Thursday, December 15, 2016 at 4:34 PM >> *To: *Conversant <jthak...@conversantmedia.com> >> <jthak...@conversantmedia.com> >> *Cc: *Michael Armbrust <mich...@databricks.com> <mich...@databricks.com>, >> Jim Hughes <jn...@ccri.com> <jn...@ccri.com>, "dev@spark.apache.org" >> <dev@spark.apache.org> <dev@spark.apache.org> <dev@spark.apache.org> >> *Subject: *Re: Expand the Spark SQL programming guide? >> >> >> >> I think it will make sense to show a sample implementation of >> UserDefinedAggregateFunction for DataFrames, and an example of the >> Aggregator API for typed Datasets. >> >> >> >> Jim, what if I submit a PR and you join the review process? I also do not >> mind to split this if you want, but it seems to be an overkill for this >> part. >> >> >> >> Jayesh, shall I skip the window functions part since you are going to >> work on that? >> >> >> >> 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com> >> : >> >> I too am interested in expanding the documentation for Spark SQL. >> >> For my work I needed to get some info/examples/guidance on window >> functions and have been using https://databricks.com/blog/20 >> 15/07/15/introducing-window-functions-in-spark-sql.html . >> >> How about divide and conquer? >> >> >> >> >> >> *From: *Michael Armbrust <mich...@databricks.com> >> *Date: *Thursday, December 15, 2016 at 3:21 PM >> *To: *Jim Hughes < <jn...@ccri.com>jn...@ccri.com> >> *Cc: *"dev@spark.apache.org" <dev@spark.apache.org> >> *Subject: *Re: Expand the Spark SQL programming guide? >> >> >> >> Pull requests would be welcome for any major missing features in the >> guide: >> <https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md> >> https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md >> >> >> >> On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes <jn...@ccri.com> wrote: >> >> Hi Anton, >> >> I'd like to see this as well. I've been working on implementing >> geospatial user-defined types and functions. Having examples of >> aggregations and window functions would be awesome! >> >> I did test out implementing a distributed convex hull as a >> UserDefinedAggregateFunction, and that seemed to work sensibly. >> >> Cheers, >> >> Jim >> >> >> >> On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: >> >> Hi, >> >> >> >> I am wondering whether it makes sense to expand the Spark SQL programming >> guide with examples of aggregations (including user-defined via the >> Aggregator API) and window functions. For instance, there might be a >> separate subsection under "Getting Started" for each functionality. >> >> >> >> SPARK-16046 seems to be related but there is no activity for more than 4 >> months. >> >> >> >> Best regards, >> >> Anton >> >> >> >> >> >> >> >> >> >
Re: Expand the Spark SQL programming guide?
Here is the pull request: https://github.com/apache/spark/pull/16329 2016-12-16 20:54 GMT+01:00 Jim Hughes <jn...@ccri.com>: > I'd be happy to review a PR. At the minute, I'm still learning Spark SQL, > so writing documentation might be a bit of a stretch, but reviewing would > be fine. > > Thanks! > > > On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote: > > Yes - that sounds good Anton, I can work on documenting the window > functions. > > > > *From: *Anton Okolnychyi <anton.okolnyc...@gmail.com> > <anton.okolnyc...@gmail.com> > *Date: *Thursday, December 15, 2016 at 4:34 PM > *To: *Conversant <jthak...@conversantmedia.com> > <jthak...@conversantmedia.com> > *Cc: *Michael Armbrust <mich...@databricks.com> <mich...@databricks.com>, > Jim Hughes <jn...@ccri.com> <jn...@ccri.com>, "dev@spark.apache.org" > <dev@spark.apache.org> <dev@spark.apache.org> <dev@spark.apache.org> > *Subject: *Re: Expand the Spark SQL programming guide? > > > > I think it will make sense to show a sample implementation of > UserDefinedAggregateFunction for DataFrames, and an example of the > Aggregator API for typed Datasets. > > > > Jim, what if I submit a PR and you join the review process? I also do not > mind to split this if you want, but it seems to be an overkill for this > part. > > > > Jayesh, shall I skip the window functions part since you are going to work > on that? > > > > 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com>: > > I too am interested in expanding the documentation for Spark SQL. > > For my work I needed to get some info/examples/guidance on window > functions and have been using https://databricks.com/blog/ > 2015/07/15/introducing-window-functions-in-spark-sql.html . > > How about divide and conquer? > > > > > > *From: *Michael Armbrust <mich...@databricks.com> > *Date: *Thursday, December 15, 2016 at 3:21 PM > *To: *Jim Hughes < <jn...@ccri.com>jn...@ccri.com> > *Cc: *"dev@spark.apache.org" <dev@spark.apache.org> > *Subject: *Re: Expand the Spark SQL programming guide? > > > > Pull requests would be welcome for any major missing features in the > guide: > <https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md> > https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md > > > > On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes <jn...@ccri.com> wrote: > > Hi Anton, > > I'd like to see this as well. I've been working on implementing > geospatial user-defined types and functions. Having examples of > aggregations and window functions would be awesome! > > I did test out implementing a distributed convex hull as a > UserDefinedAggregateFunction, and that seemed to work sensibly. > > Cheers, > > Jim > > > > On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: > > Hi, > > > > I am wondering whether it makes sense to expand the Spark SQL programming > guide with examples of aggregations (including user-defined via the > Aggregator API) and window functions. For instance, there might be a > separate subsection under "Getting Started" for each functionality. > > > > SPARK-16046 seems to be related but there is no activity for more than 4 > months. > > > > Best regards, > > Anton > > > > > > > > >
Re: Expand the Spark SQL programming guide?
I'd be happy to review a PR. At the minute, I'm still learning Spark SQL, so writing documentation might be a bit of a stretch, but reviewing would be fine. Thanks! On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote: Yes - that sounds good Anton, I can work on documenting the window functions. *From: *Anton Okolnychyi <anton.okolnyc...@gmail.com> *Date: *Thursday, December 15, 2016 at 4:34 PM *To: *Conversant <jthak...@conversantmedia.com> *Cc: *Michael Armbrust <mich...@databricks.com>, Jim Hughes <jn...@ccri.com>, "dev@spark.apache.org" <dev@spark.apache.org> *Subject: *Re: Expand the Spark SQL programming guide? I think it will make sense to show a sample implementation of UserDefinedAggregateFunction for DataFrames, and an example of the Aggregator API for typed Datasets. Jim, what if I submit a PR and you join the review process? I also do not mind to split this if you want, but it seems to be an overkill for this part. Jayesh, shall I skip the window functions part since you are going to work on that? 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com <mailto:jthak...@conversantmedia.com>>: I too am interested in expanding the documentation for Spark SQL. For my work I needed to get some info/examples/guidance on window functions and have been using https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html . How about divide and conquer? *From: *Michael Armbrust <mich...@databricks.com <mailto:mich...@databricks.com>> *Date: *Thursday, December 15, 2016 at 3:21 PM *To: *Jim Hughes <jn...@ccri.com <mailto:jn...@ccri.com>> *Cc: *"dev@spark.apache.org <mailto:dev@spark.apache.org>" <dev@spark.apache.org <mailto:dev@spark.apache.org>> *Subject: *Re: Expand the Spark SQL programming guide? Pull requests would be welcome for any major missing features in the guide: https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes <jn...@ccri.com <mailto:jn...@ccri.com>> wrote: Hi Anton, I'd like to see this as well. I've been working on implementing geospatial user-defined types and functions. Having examples of aggregations and window functions would be awesome! I did test out implementing a distributed convex hull as a UserDefinedAggregateFunction, and that seemed to work sensibly. Cheers, Jim On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046 seems to be related but there is no activity for more than 4 months. Best regards, Anton
Re: Expand the Spark SQL programming guide?
Yes - that sounds good Anton, I can work on documenting the window functions. From: Anton Okolnychyi <anton.okolnyc...@gmail.com> Date: Thursday, December 15, 2016 at 4:34 PM To: Conversant <jthak...@conversantmedia.com> Cc: Michael Armbrust <mich...@databricks.com>, Jim Hughes <jn...@ccri.com>, "dev@spark.apache.org" <dev@spark.apache.org> Subject: Re: Expand the Spark SQL programming guide? I think it will make sense to show a sample implementation of UserDefinedAggregateFunction for DataFrames, and an example of the Aggregator API for typed Datasets. Jim, what if I submit a PR and you join the review process? I also do not mind to split this if you want, but it seems to be an overkill for this part. Jayesh, shall I skip the window functions part since you are going to work on that? 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>>: I too am interested in expanding the documentation for Spark SQL. For my work I needed to get some info/examples/guidance on window functions and have been using https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html . How about divide and conquer? From: Michael Armbrust <mich...@databricks.com<mailto:mich...@databricks.com>> Date: Thursday, December 15, 2016 at 3:21 PM To: Jim Hughes <jn...@ccri.com<mailto:jn...@ccri.com>> Cc: "dev@spark.apache.org<mailto:dev@spark.apache.org>" <dev@spark.apache.org<mailto:dev@spark.apache.org>> Subject: Re: Expand the Spark SQL programming guide? Pull requests would be welcome for any major missing features in the guide: https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes <jn...@ccri.com<mailto:jn...@ccri.com>> wrote: Hi Anton, I'd like to see this as well. I've been working on implementing geospatial user-defined types and functions. Having examples of aggregations and window functions would be awesome! I did test out implementing a distributed convex hull as a UserDefinedAggregateFunction, and that seemed to work sensibly. Cheers, Jim On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046 seems to be related but there is no activity for more than 4 months. Best regards, Anton
Re: Expand the Spark SQL programming guide?
I think it will make sense to show a sample implementation of UserDefinedAggregateFunction for DataFrames, and an example of the Aggregator API for typed Datasets. Jim, what if I submit a PR and you join the review process? I also do not mind to split this if you want, but it seems to be an overkill for this part. Jayesh, shall I skip the window functions part since you are going to work on that? 2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh <jthak...@conversantmedia.com>: > I too am interested in expanding the documentation for Spark SQL. > > For my work I needed to get some info/examples/guidance on window > functions and have been using https://databricks.com/blog/ > 2015/07/15/introducing-window-functions-in-spark-sql.html . > > How about divide and conquer? > > > > > > *From: *Michael Armbrust <mich...@databricks.com> > *Date: *Thursday, December 15, 2016 at 3:21 PM > *To: *Jim Hughes <jn...@ccri.com> > *Cc: *"dev@spark.apache.org" <dev@spark.apache.org> > *Subject: *Re: Expand the Spark SQL programming guide? > > > > Pull requests would be welcome for any major missing features in the > guide: https://github.com/apache/spark/blob/master/docs/ > sql-programming-guide.md > > > > On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes <jn...@ccri.com> wrote: > > Hi Anton, > > I'd like to see this as well. I've been working on implementing > geospatial user-defined types and functions. Having examples of > aggregations and window functions would be awesome! > > I did test out implementing a distributed convex hull as a > UserDefinedAggregateFunction, and that seemed to work sensibly. > > Cheers, > > Jim > > > > On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: > > Hi, > > > > I am wondering whether it makes sense to expand the Spark SQL programming > guide with examples of aggregations (including user-defined via the > Aggregator API) and window functions. For instance, there might be a > separate subsection under "Getting Started" for each functionality. > > > > SPARK-16046 seems to be related but there is no activity for more than 4 > months. > > > > Best regards, > > Anton > > > > >
Re: Expand the Spark SQL programming guide?
Hi Anton, I'd like to see this as well. I've been working on implementing geospatial user-defined types and functions. Having examples of aggregations and window functions would be awesome! I did test out implementing a distributed convex hull as a UserDefinedAggregateFunction, and that seemed to work sensibly. Cheers, Jim On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046 seems to be related but there is no activity for more than 4 months. Best regards, Anton
Expand the Spark SQL programming guide?
Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046 seems to be related but there is no activity for more than 4 months. Best regards, Anton