[jira] [Created] (SPARK-18299) Allow more aggregations on KeyValueGroupedDataset
Matthias Niehoff created SPARK-18299: Summary: Allow more aggregations on KeyValueGroupedDataset Key: SPARK-18299 URL: https://issues.apache.org/jira/browse/SPARK-18299 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.1 Reporter: Matthias Niehoff The number of possible aggregations on a KeyValueGroupedDataset created by groupByKey is limited to 4, as there are only methods with a maximum of 4 parameters. This value should be increased or - even better - made be completely unlimited. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14236) UDAF does not use incomingSchema for update Method
Matthias Niehoff created SPARK-14236: Summary: UDAF does not use incomingSchema for update Method Key: SPARK-14236 URL: https://issues.apache.org/jira/browse/SPARK-14236 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Reporter: Matthias Niehoff Priority: Minor When I specify a schema for the incoming data in an UDAF, the schema will not be applied to the incoming row in the update method. I can only access the fields using their numeric indices and not with their names. The Fields in the row are named input0, input1,... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Niehoff updated SPARK-11782: - Priority: Minor (was: Major) > Master Web UI should link to correct Application UI in cluster mode > --- > > Key: SPARK-11782 > URL: https://issues.apache.org/jira/browse/SPARK-11782 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.4.1 >Reporter: Matthias Niehoff >Priority: Minor > > - Running a standalone cluster, with node1 as master > - Submit an application to cluster with deploy-mode=cluster > - Application driver is on node other than node1 (i.e. node3) > => master WebUI links to node1:4040 for Application Detail UI and not to > node3:4040 > As the master knows on which worker the driver is running, it should be > possible to show the correct link to the Application Detail UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021730#comment-15021730 ] Matthias Niehoff commented on SPARK-11782: -- I submit the App with deploy-mode cluster, then the driver gets started inside the cluster. This could be any node then and does not necessarily have to be the node where spark-submit was executed. > Master Web UI should link to correct Application UI in cluster mode > --- > > Key: SPARK-11782 > URL: https://issues.apache.org/jira/browse/SPARK-11782 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.4.1 >Reporter: Matthias Niehoff > > - Running a standalone cluster, with node1 as master > - Submit an application to cluster with deploy-mode=cluster > - Application driver is on node other than node1 (i.e. node3) > => master WebUI links to node1:4040 for Application Detail UI and not to > node3:4040 > As the master knows on which worker the driver is running, it should be > possible to show the correct link to the Application Detail UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011192#comment-15011192 ] Matthias Niehoff commented on SPARK-11782: -- What I did: - started master on node 1 - started a worker on node 1 - started a worker on node 2 - "spark-submit --deploy-mode cluster " on node 1 On the master UI the Application Detail UI links contains an URL based on node1, but the driver is started on node2 (and the web-app is only reachable on the master?) Hope this helps :-) > Master Web UI should link to correct Application UI in cluster mode > --- > > Key: SPARK-11782 > URL: https://issues.apache.org/jira/browse/SPARK-11782 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.4.1 >Reporter: Matthias Niehoff > > - Running a standalone cluster, with node1 as master > - Submit an application to cluster with deploy-mode=cluster > - Application driver is on node other than node1 (i.e. node3) > => master WebUI links to node1:4040 for Application Detail UI and not to > node3:4040 > As the master knows on which worker the driver is running, it should be > possible to show the correct link to the Application Detail UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode
Matthias Niehoff created SPARK-11782: Summary: Master Web UI should link to correct Application UI in cluster mode Key: SPARK-11782 URL: https://issues.apache.org/jira/browse/SPARK-11782 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.1 Reporter: Matthias Niehoff - Running a standalone cluster, with node1 as master - Submit an application to cluster with deploy-mode=cluster - Application driver is on node other than node1 (i.e. node3) => master WebUI links to node1:4040 for Application Detail UI and not to node3:4040 As the master knows on which worker the driver is running, it should be possible to show the correct link to the Application Detail UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4751) Support dynamic allocation for standalone mode
[ https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974119#comment-14974119 ] Matthias Niehoff edited comment on SPARK-4751 at 10/26/15 8:05 PM: --- The PR is merged, but the documentation at https://spark.apache.org/docs/1.5.1/job-scheduling.html still says: "This feature is currently disabled by default and available only on YARN." Is the documentation just outdated or is it not yet available in 1.5.x? was (Author: j4nu5): The PR is merged, but the documentation at https://spark.apache.org/docs/1.5.1/job-scheduling.html still says: "This feature is currently disabled by default and available only on YARN." Is the documentation just outdated or is not yet available in 1.5.x? > Support dynamic allocation for standalone mode > -- > > Key: SPARK-4751 > URL: https://issues.apache.org/jira/browse/SPARK-4751 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Critical > Fix For: 1.5.0 > > > This is equivalent to SPARK-3822 but for standalone mode. > This is actually a very tricky issue because the scheduling mechanism in the > standalone Master uses different semantics. In standalone mode we allocate > resources based on cores. By default, an application will grab all the cores > in the cluster unless "spark.cores.max" is specified. Unfortunately, this > means an application could get executors of different sizes (in terms of > cores) if: > 1) App 1 kills an executor > 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker > 3) App 1 requests an executor > In this case, the new executor that App 1 gets back will be smaller than the > rest and can execute fewer tasks in parallel. Further, standalone mode is > subject to the constraint that only one executor can be allocated on each > worker per application. As a result, it is rather meaningless to request new > executors if the existing ones are already spread out across all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4751) Support dynamic allocation for standalone mode
[ https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974119#comment-14974119 ] Matthias Niehoff commented on SPARK-4751: - The PR is merged, but the documentation at https://spark.apache.org/docs/1.5.1/job-scheduling.html still says: "This feature is currently disabled by default and available only on YARN." Is the documentation just outdated or is not yet available in 1.5.x? > Support dynamic allocation for standalone mode > -- > > Key: SPARK-4751 > URL: https://issues.apache.org/jira/browse/SPARK-4751 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Critical > Fix For: 1.5.0 > > > This is equivalent to SPARK-3822 but for standalone mode. > This is actually a very tricky issue because the scheduling mechanism in the > standalone Master uses different semantics. In standalone mode we allocate > resources based on cores. By default, an application will grab all the cores > in the cluster unless "spark.cores.max" is specified. Unfortunately, this > means an application could get executors of different sizes (in terms of > cores) if: > 1) App 1 kills an executor > 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker > 3) App 1 requests an executor > In this case, the new executor that App 1 gets back will be smaller than the > rest and can execute fewer tasks in parallel. Further, standalone mode is > subject to the constraint that only one executor can be allocated on each > worker per application. As a result, it is rather meaningless to request new > executors if the existing ones are already spread out across all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org