[jira] [Updated] (SPARK-21358) Argument of repartitionandsortwithinpartitions at pyspark
[ https://issues.apache.org/jira/browse/SPARK-21358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chie hayashida updated SPARK-21358: --- Description: In rdd.py, implementation of repartitionandsortwithinpartitions is below. {code} def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): {code} And at document, there is following sample script. {code} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) {code} The third argument (ascending) expected to be boolean, so following script is better, I think. {code} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) {code} was: In rdd.py, implementation of repartitionandsortwithinpartitions is below. {code:python} def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): {code} And at document, there is following sample script. {code:python} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) {code} The third argument (ascending) expected to be boolean, so following script is better, I think. {code:python} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) {code} > Argument of repartitionandsortwithinpartitions at pyspark > - > > Key: SPARK-21358 > URL: https://issues.apache.org/jira/browse/SPARK-21358 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 2.1.1 >Reporter: chie hayashida >Priority: Minor > > In rdd.py, implementation of repartitionandsortwithinpartitions is below. > {code} > def repartitionAndSortWithinPartitions(self, numPartitions=None, > partitionFunc=portable_hash, >ascending=True, keyfunc=lambda x: > x): > {code} > And at document, there is following sample script. > {code} > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > 2) > {code} > The third argument (ascending) expected to be boolean, so following script is > better, I think. > {code} > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > True) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21358) Argument of repartitionandsortwithinpartitions at pyspark
[ https://issues.apache.org/jira/browse/SPARK-21358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chie hayashida updated SPARK-21358: --- Description: In rdd.py, implementation of repartitionandsortwithinpartitions is below. {code:python} def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): {code} And at document, there is following sample script. {code:python} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) {code} The third argument (ascending) expected to be boolean, so following script is better, I think. {code:python} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) {code} was: In rdd.py, implementation of repartitionandsortwithinpartitions is below. ``` def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): ``` And at document, there is following sample script. ``` >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) ``` The third argument (ascending) expected to be boolean, so following script is better, I think. ``` >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) ``` > Argument of repartitionandsortwithinpartitions at pyspark > - > > Key: SPARK-21358 > URL: https://issues.apache.org/jira/browse/SPARK-21358 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 2.1.1 >Reporter: chie hayashida >Priority: Minor > > In rdd.py, implementation of repartitionandsortwithinpartitions is below. > {code:python} > def repartitionAndSortWithinPartitions(self, numPartitions=None, > partitionFunc=portable_hash, >ascending=True, keyfunc=lambda x: > x): > {code} > And at document, there is following sample script. > {code:python} > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > 2) > {code} > The third argument (ascending) expected to be boolean, so following script is > better, I think. > {code:python} > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > True) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21358) Argument of repartitionandsortwithinpartitions at pyspark
[ https://issues.apache.org/jira/browse/SPARK-21358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chie hayashida updated SPARK-21358: --- Summary: Argument of repartitionandsortwithinpartitions at pyspark (was: variable of repartitionandsortwithinpartitions at pyspark) > Argument of repartitionandsortwithinpartitions at pyspark > - > > Key: SPARK-21358 > URL: https://issues.apache.org/jira/browse/SPARK-21358 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 2.1.1 >Reporter: chie hayashida >Priority: Minor > > In rdd.py, implementation of repartitionandsortwithinpartitions is below. > ``` >def repartitionAndSortWithinPartitions(self, numPartitions=None, > partitionFunc=portable_hash, >ascending=True, keyfunc=lambda x: > x): > ``` > And at document, there is following sample script. > ``` > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > 2) > ``` > The third argument (ascending) expected to be boolean, so following script is > better, I think. > ``` > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > True) > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21358) variable of repartitionandsortwithinpartitions at pyspark
chie hayashida created SPARK-21358: -- Summary: variable of repartitionandsortwithinpartitions at pyspark Key: SPARK-21358 URL: https://issues.apache.org/jira/browse/SPARK-21358 Project: Spark Issue Type: Improvement Components: Documentation, Examples Affects Versions: 2.1.1 Reporter: chie hayashida Priority: Minor In rdd.py, implementation of repartitionandsortwithinpartitions is below. ``` def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): ``` And at document, there is following sample script. ``` >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) ``` The third argument (ascending) expected to be boolean, so following script is better, I think. ``` >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807077#comment-15807077 ] chie hayashida edited comment on SPARK-17154 at 1/7/17 8:18 AM: [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. h2. Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 2| 3| 1| 1| 1| | 1| 4| 5| 1| 4| 5| | 1| 4| 5| 1| 2| 3| | 1| 4| 5| 1| 1| 1| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 5| 7| 2| 2| 4| | 2| 8| 8| 2| 8| 8| | 2| 8| 8| 2| 5| 7| | 2| 8| 8| 2| 2| 4| +---+--+--+---+--+--+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id#178], Inner, BuildRight :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id#178, value1#179, value2#180] {code} h2. Example2 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df.select($"id".as("id2"),$"value1".as("value11"),$"value2".as("value22")) df4: org.apache.spark.sql.DataFrame = [id2: int, value11: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id2") && df("value2") <= df2("value22")) df5: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+---+---+ | id|value1|value2|id2|value11|value22| +---+--+--+---+---+---+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 4| 5| 1| 4| 5| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 8| 8| 2| 8| 8| +---+--+--+---+---+---+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id2#243], Inner, BuildRight, (value2#173 <= value22#245) :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id2#243, value11#244, value22#245] {code} The content of df3 are different between Example1 and Example2. I think reason of this is same as SPARK-17154. In above case I understand result of Example1 is incollect and that of Example 2 is collect. But this issue isn't trivial and some developer may overlook this buggy code, I think. Permanent action should be taken for this issue, I think. was (Author: hayashidac): [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. h2. Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2|
[jira] [Comment Edited] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807077#comment-15807077 ] chie hayashida edited comment on SPARK-17154 at 1/7/17 8:17 AM: [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. h2. Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 2| 3| 1| 1| 1| | 1| 4| 5| 1| 4| 5| | 1| 4| 5| 1| 2| 3| | 1| 4| 5| 1| 1| 1| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 5| 7| 2| 2| 4| | 2| 8| 8| 2| 8| 8| | 2| 8| 8| 2| 5| 7| | 2| 8| 8| 2| 2| 4| +---+--+--+---+--+--+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id#178], Inner, BuildRight :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id#178, value1#179, value2#180] {code} h2. Example2 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df.select($"id".as("id2"),$"value1".as("value11"),$"value2".as("value22")) df4: org.apache.spark.sql.DataFrame = [id2: int, value11: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id2") && df("value2") <= df2("value22")) df5: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+---+---+ | id|value1|value2|id2|value11|value22| +---+--+--+---+---+---+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 4| 5| 1| 4| 5| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 8| 8| 2| 8| 8| +---+--+--+---+---+---+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id2#243], Inner, BuildRight, (value2#173 <= value22#245) :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id2#243, value11#244, value22#245] {code} The content of df3 are different between Example1 and Example2. I think the reason of this is same as SPARK-17154. In above case I understand result of Example1 is incollect and that of Example 2 is collect. But this issue isn't trivial and some developer may overlook this buggy code, I think. Permanent action should be taken for this issue, I think. was (Author: hayashidac): [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. h2. Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1|
[jira] [Comment Edited] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807077#comment-15807077 ] chie hayashida edited comment on SPARK-17154 at 1/7/17 8:17 AM: [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. h2. Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 2| 3| 1| 1| 1| | 1| 4| 5| 1| 4| 5| | 1| 4| 5| 1| 2| 3| | 1| 4| 5| 1| 1| 1| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 5| 7| 2| 2| 4| | 2| 8| 8| 2| 8| 8| | 2| 8| 8| 2| 5| 7| | 2| 8| 8| 2| 2| 4| +---+--+--+---+--+--+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id#178], Inner, BuildRight :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id#178, value1#179, value2#180] {code} h2 Example2 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df.select($"id".as("id2"),$"value1".as("value11"),$"value2".as("value22")) df4: org.apache.spark.sql.DataFrame = [id2: int, value11: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id2") && df("value2") <= df2("value22")) df5: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+---+---+ | id|value1|value2|id2|value11|value22| +---+--+--+---+---+---+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 4| 5| 1| 4| 5| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 8| 8| 2| 8| 8| +---+--+--+---+---+---+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id2#243], Inner, BuildRight, (value2#173 <= value22#245) :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id2#243, value11#244, value22#245] {code} The content of df3 are different between Example1 and Example2. I think the reason of this is same as SPARK-17154. In above case I understand result of Example1 is incollect and that of Example 2 is collect. But this issue isn't trivial and some developer may overlook this buggy code, I think. Permanent action should be taken for this issue, I think. was (Author: hayashidac): [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. # Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2|
[jira] [Comment Edited] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807077#comment-15807077 ] chie hayashida edited comment on SPARK-17154 at 1/7/17 8:14 AM: [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. # Example 1 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 2| 3| 1| 1| 1| | 1| 4| 5| 1| 4| 5| | 1| 4| 5| 1| 2| 3| | 1| 4| 5| 1| 1| 1| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 5| 7| 2| 2| 4| | 2| 8| 8| 2| 8| 8| | 2| 8| 8| 2| 5| 7| | 2| 8| 8| 2| 2| 4| +---+--+--+---+--+--+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id#178], Inner, BuildRight :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id#178, value1#179, value2#180] {code} # Example2 {code} scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df.select($"id".as("id2"),$"value1".as("value11"),$"value2".as("value22")) df4: org.apache.spark.sql.DataFrame = [id2: int, value11: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id2") && df("value2") <= df2("value22")) df5: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+---+---+ | id|value1|value2|id2|value11|value22| +---+--+--+---+---+---+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 4| 5| 1| 4| 5| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 8| 8| 2| 8| 8| +---+--+--+---+---+---+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id2#243], Inner, BuildRight, (value2#173 <= value22#245) :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id2#243, value11#244, value22#245] {code} The content of df3 are different between Example1 and Example2. I think the reason of this is same as SPARK-17154. In above case I understand result of Example1 is incollect and that of Example 2 is collect. But this issue isn't trivial and some developer may overlook this buggy code, I think. Permanent action should be taken for this issue, I think. was (Author: hayashidac): [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. # Example 1 ``` scala scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2|
[jira] [Commented] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807077#comment-15807077 ] chie hayashida commented on SPARK-17154: [~nsyca], [~cloud_fan], [~sarutak] I have an example code below. # Example 1 ``` scala scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df df2: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id") && df("value2") <= df2("value2")) 17/01/07 16:29:26 WARN Column: Constructing trivially true equals predicate, 'id#171 = id#171'. Perhaps you need to use aliases. df3: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+--+--+ | id|value1|value2| id|value1|value2| +---+--+--+---+--+--+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 2| 3| 1| 1| 1| | 1| 4| 5| 1| 4| 5| | 1| 4| 5| 1| 2| 3| | 1| 4| 5| 1| 1| 1| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 5| 7| 2| 2| 4| | 2| 8| 8| 2| 8| 8| | 2| 8| 8| 2| 5| 7| | 2| 8| 8| 2| 2| 4| +---+--+--+---+--+--+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id#178], Inner, BuildRight :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id#178, value1#179, value2#180] ``` # Example2 ```scala scala> val df = Seq((1,1,1),(1,2,3),(1,4,5),(2,2,4),(2,5,7),(2,8,8)).toDF("id","value1","value2") df: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 1 more field] scala> val df2 = df.select($"id".as("id2"),$"value1".as("value11"),$"value2".as("value22")) df4: org.apache.spark.sql.DataFrame = [id2: int, value11: int ... 1 more field] scala> val df3 = df.join(df2,df("id") === df2("id2") && df("value2") <= df2("value22")) df5: org.apache.spark.sql.DataFrame = [id: int, value1: int ... 4 more fields] scala> df3.show +---+--+--+---+---+---+ | id|value1|value2|id2|value11|value22| +---+--+--+---+---+---+ | 1| 1| 1| 1| 4| 5| | 1| 1| 1| 1| 2| 3| | 1| 1| 1| 1| 1| 1| | 1| 2| 3| 1| 4| 5| | 1| 2| 3| 1| 2| 3| | 1| 4| 5| 1| 4| 5| | 2| 2| 4| 2| 8| 8| | 2| 2| 4| 2| 5| 7| | 2| 2| 4| 2| 2| 4| | 2| 5| 7| 2| 8| 8| | 2| 5| 7| 2| 5| 7| | 2| 8| 8| 2| 8| 8| +---+--+--+---+---+---+ scala> df3.explain == Physical Plan == *BroadcastHashJoin [id#171], [id2#243], Inner, BuildRight, (value2#173 <= value22#245) :- LocalTableScan [id#171, value1#172, value2#173] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [id2#243, value11#244, value22#245] ``` The content of df3 are different between Example1 and Example2. I think the reason of this is same as SPARK-17154. In above case I understand result of Example1 is incollect and that of Example 2 is collect. But this issue isn't trivial and some developer may overlook this buggy code, I think. Permanent action should be taken for this issue, I think. > Wrong result can be returned or AnalysisException can be thrown after > self-join or similar operations > - > > Key: SPARK-17154 > URL: https://issues.apache.org/jira/browse/SPARK-17154 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Kousuke Saruta > Attachments: Name-conflicts-2.pdf, Solution_Proposal_SPARK-17154.pdf > > > When we join two DataFrames which are originated from a same DataFrame, > operations to the joined DataFrame can fail. > One reproducible example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val joined = filtered.join(df, filtered("col1") === df("col1"), "inner") > val selected1 =
[jira] [Closed] (SPARK-18384) explanation of maxMemoryInMB in treeParams at should be written more in API doc
[ https://issues.apache.org/jira/browse/SPARK-18384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chie hayashida closed SPARK-18384. -- Resolution: Invalid It has already been fixed > explanation of maxMemoryInMB in treeParams at should be written more in API > doc > > > Key: SPARK-18384 > URL: https://issues.apache.org/jira/browse/SPARK-18384 > Project: Spark > Issue Type: Documentation >Reporter: chie hayashida >Priority: Minor > > explanation of maxMemoryInMB in treeParams is too simple in scala API doc. > We should mention more about this parameter's effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18384) explanation of maxMemoryInMB in treeParams at should be written more in API doc
chie hayashida created SPARK-18384: -- Summary: explanation of maxMemoryInMB in treeParams at should be written more in API doc Key: SPARK-18384 URL: https://issues.apache.org/jira/browse/SPARK-18384 Project: Spark Issue Type: Documentation Reporter: chie hayashida Priority: Minor explanation of maxMemoryInMB in treeParams is too simple in scala API doc. We should mention more about this parameter's effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650402#comment-15650402 ] chie hayashida edited comment on SPARK-17154 at 11/9/16 9:08 AM: - I faced this problem. How is the progress? was (Author: hayashidac): I'm facing this problem. How is the progress? > Wrong result can be returned or AnalysisException can be thrown after > self-join or similar operations > - > > Key: SPARK-17154 > URL: https://issues.apache.org/jira/browse/SPARK-17154 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Kousuke Saruta > Attachments: Name-conflicts-2.pdf, Solution_Proposal_SPARK-17154.pdf > > > When we join two DataFrames which are originated from a same DataFrame, > operations to the joined DataFrame can fail. > One reproducible example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val joined = filtered.join(df, filtered("col1") === df("col1"), "inner") > val selected1 = joined.select(df("col3")) > {code} > In this case, AnalysisException is thrown. > Another example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val rightOuterJoined = filtered.join(df, filtered("col1") === df("col1"), > "right") > val selected2 = rightOuterJoined.select(df("col1")) > selected2.show > {code} > In this case, we will expect to get the answer like as follows. > {code} > 1 > 2 > 3 > 4 > 5 > {code} > But the actual result is as follows. > {code} > 1 > 2 > null > 4 > 5 > {code} > The cause of the problems in the examples is that the logical plan related to > the right side DataFrame and the expressions of its output are re-created in > the analyzer (at ResolveReference rule) when a DataFrame has expressions > which have a same exprId each other. > Re-created expressions are equally to the original ones except exprId. > This will happen when we do self-join or similar pattern operations. > In the first example, df("col3") returns a Column which includes an > expression and the expression have an exprId (say id1 here). > After join, the expresion which the right side DataFrame (df) has is > re-created and the old and new expressions are equally but exprId is renewed > (say id2 for the new exprId here). > Because of the mismatch of those exprIds, AnalysisException is thrown. > In the second example, df("col1") returns a column and the expression > contained in the column is assigned an exprId (say id3). > On the other hand, a column returned by filtered("col1") has an expression > which has the same exprId (id3). > After join, the expressions in the right side DataFrame are re-created and > the expression assigned id3 is no longer present in the right side but > present in the left side. > So, referring df("col1") to the joined DataFrame, we get col1 of right side > which includes null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650402#comment-15650402 ] chie hayashida commented on SPARK-17154: I'm facing this problem. How is the progress? > Wrong result can be returned or AnalysisException can be thrown after > self-join or similar operations > - > > Key: SPARK-17154 > URL: https://issues.apache.org/jira/browse/SPARK-17154 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Kousuke Saruta > Attachments: Name-conflicts-2.pdf, Solution_Proposal_SPARK-17154.pdf > > > When we join two DataFrames which are originated from a same DataFrame, > operations to the joined DataFrame can fail. > One reproducible example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val joined = filtered.join(df, filtered("col1") === df("col1"), "inner") > val selected1 = joined.select(df("col3")) > {code} > In this case, AnalysisException is thrown. > Another example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val rightOuterJoined = filtered.join(df, filtered("col1") === df("col1"), > "right") > val selected2 = rightOuterJoined.select(df("col1")) > selected2.show > {code} > In this case, we will expect to get the answer like as follows. > {code} > 1 > 2 > 3 > 4 > 5 > {code} > But the actual result is as follows. > {code} > 1 > 2 > null > 4 > 5 > {code} > The cause of the problems in the examples is that the logical plan related to > the right side DataFrame and the expressions of its output are re-created in > the analyzer (at ResolveReference rule) when a DataFrame has expressions > which have a same exprId each other. > Re-created expressions are equally to the original ones except exprId. > This will happen when we do self-join or similar pattern operations. > In the first example, df("col3") returns a Column which includes an > expression and the expression have an exprId (say id1 here). > After join, the expresion which the right side DataFrame (df) has is > re-created and the old and new expressions are equally but exprId is renewed > (say id2 for the new exprId here). > Because of the mismatch of those exprIds, AnalysisException is thrown. > In the second example, df("col1") returns a column and the expression > contained in the column is assigned an exprId (say id3). > On the other hand, a column returned by filtered("col1") has an expression > which has the same exprId (id3). > After join, the expressions in the right side DataFrame are re-created and > the expression assigned id3 is no longer present in the right side but > present in the left side. > So, referring df("col1") to the joined DataFrame, we get col1 of right side > which includes null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13770) Document the ML feature Interaction
[ https://issues.apache.org/jira/browse/SPARK-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611044#comment-15611044 ] chie hayashida commented on SPARK-13770: I added examples and documentation. please check it. > Document the ML feature Interaction > --- > > Key: SPARK-13770 > URL: https://issues.apache.org/jira/browse/SPARK-13770 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML >Affects Versions: 1.6.0 >Reporter: Abbass Marouni >Priority: Minor > > The ML feature Interaction > (http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/Interaction.html) > is not included in the documentation of ML features. It'd be nice to provide > a working example and some documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16987) Add spark-default.conf property to define https port for spark history server
[ https://issues.apache.org/jira/browse/SPARK-16987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605225#comment-15605225 ] chie hayashida commented on SPARK-16987: Can I work on this issue? > Add spark-default.conf property to define https port for spark history server > - > > Key: SPARK-16987 > URL: https://issues.apache.org/jira/browse/SPARK-16987 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yesha Vora >Priority: Minor > > With SPARK-2750, Spark History server UI becomes accessible on https port. > Currently, https port is pre-defined to http port + 400. > Spark History server UI https port should not be pre-defined but it should be > configurable. > Thus, spark should to introduce new property to make spark history server > https port configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16988) spark history server log needs to be fixed to show https url when ssl is enabled
[ https://issues.apache.org/jira/browse/SPARK-16988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602484#comment-15602484 ] chie hayashida commented on SPARK-16988: Can I work on this issue? > spark history server log needs to be fixed to show https url when ssl is > enabled > > > Key: SPARK-16988 > URL: https://issues.apache.org/jira/browse/SPARK-16988 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Yesha Vora >Priority: Minor > > When spark ssl is enabled, spark history server ui ( http://host:port) is > redirected to https://host:port+400. > So, spark history server log should be updated to print https url instead > http url > {code:title=spark HS log} > 16/08/09 15:21:11 INFO ServerConnector: Started > ServerConnector@3970a5ee{SSL-HTTP/1.1}{0.0.0.0:18481} > 16/08/09 15:21:11 INFO Server: Started @4023ms > 16/08/09 15:21:11 INFO Utils: Successfully started service on port 18081. > 16/08/09 15:21:11 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and > started at http://xxx:18081 > 16/08/09 15:22:52 INFO FsHistoryProvider: Replaying log path: > hdfs://xxx:8020/yy/application_1470756121646_0001.inprogress{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16988) spark history server log needs to be fixed to show https url when ssl is enabled
[ https://issues.apache.org/jira/browse/SPARK-16988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chie hayashida updated SPARK-16988: --- Comment: was deleted (was: Can I work on it?) > spark history server log needs to be fixed to show https url when ssl is > enabled > > > Key: SPARK-16988 > URL: https://issues.apache.org/jira/browse/SPARK-16988 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Yesha Vora >Priority: Minor > > When spark ssl is enabled, spark history server ui ( http://host:port) is > redirected to https://host:port+400. > So, spark history server log should be updated to print https url instead > http url > {code:title=spark HS log} > 16/08/09 15:21:11 INFO ServerConnector: Started > ServerConnector@3970a5ee{SSL-HTTP/1.1}{0.0.0.0:18481} > 16/08/09 15:21:11 INFO Server: Started @4023ms > 16/08/09 15:21:11 INFO Utils: Successfully started service on port 18081. > 16/08/09 15:21:11 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and > started at http://xxx:18081 > 16/08/09 15:22:52 INFO FsHistoryProvider: Replaying log path: > hdfs://xxx:8020/yy/application_1470756121646_0001.inprogress{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16988) spark history server log needs to be fixed to show https url when ssl is enabled
[ https://issues.apache.org/jira/browse/SPARK-16988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602442#comment-15602442 ] chie hayashida commented on SPARK-16988: Can I work on it? > spark history server log needs to be fixed to show https url when ssl is > enabled > > > Key: SPARK-16988 > URL: https://issues.apache.org/jira/browse/SPARK-16988 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.0 >Reporter: Yesha Vora >Priority: Minor > > When spark ssl is enabled, spark history server ui ( http://host:port) is > redirected to https://host:port+400. > So, spark history server log should be updated to print https url instead > http url > {code:title=spark HS log} > 16/08/09 15:21:11 INFO ServerConnector: Started > ServerConnector@3970a5ee{SSL-HTTP/1.1}{0.0.0.0:18481} > 16/08/09 15:21:11 INFO Server: Started @4023ms > 16/08/09 15:21:11 INFO Utils: Successfully started service on port 18081. > 16/08/09 15:21:11 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and > started at http://xxx:18081 > 16/08/09 15:22:52 INFO FsHistoryProvider: Replaying log path: > hdfs://xxx:8020/yy/application_1470756121646_0001.inprogress{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org