[jira] [Updated] (SPARK-24078) reduce with unionAll takes a long time
[ https://issues.apache.org/jira/browse/SPARK-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24078: - Labels: bulk-closed (was: ) > reduce with unionAll takes a long time > -- > > Key: SPARK-24078 > URL: https://issues.apache.org/jira/browse/SPARK-24078 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.3 >Reporter: zhangsongcheng >Priority: Major > Labels: bulk-closed > > I try to sample the traning sets with each category,and then uion all samples > together.This is my code: > def balance4Single(dataSet: DataFrame): DataFrame = { > val samples = LabelConf.cardIDList.map { cardID => > val tmpDataSet = dataSet.filter(col("card_id") === cardID) > val sample = underSample(tmpDataSet, cardID) > sample > } > samples.reduce((x, y) => x.unionAll(y)) > } > def underSample(dataSet: DataFrame, cardID: String): DataFrame = { > val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1) > val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1) > positiveSample.unionAll(negativeSample).distinct() > } > But the code blocked in {{samples.reduce((x, y) => x.unionAll(y))}}, and it > runs slowly and slowly, and even cannot run any more.It confused me a long > time.Who can help me? Than you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24078) reduce with unionAll takes a long time
[ https://issues.apache.org/jira/browse/SPARK-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangsongcheng updated SPARK-24078: --- Description: I try to sample the traning sets with each category,and then uion all samples together.This is my code: def balance4Single(dataSet: DataFrame): DataFrame = { val samples = LabelConf.cardIDList.map { cardID => val tmpDataSet = dataSet.filter(col("card_id") === cardID) val sample = underSample(tmpDataSet, cardID) sample } samples.reduce((x, y) => x.unionAll(y)) } def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1) val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1) positiveSample.unionAll(negativeSample).distinct() } But the code blocked in {{samples.reduce((x, y) => x.unionAll(y))}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! was: I try to sample the traning sets with each category,and then uion all samples together.This is my code: {{ def balanceCategory(dataSet: DataFrame): DataFrame = }} {{{}} val samples = LabelConf.categories.map { category => {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} val sample = underSample(tmpDataSet, category) sample } {{ samples.reduce((x, y) => x.unionAll(y))}} } {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1)}} {{ val positiveSample.unionAll(negativeSample)}} } But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! > reduce with unionAll takes a long time > -- > > Key: SPARK-24078 > URL: https://issues.apache.org/jira/browse/SPARK-24078 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.3 >Reporter: zhangsongcheng >Priority: Major > > I try to sample the traning sets with each category,and then uion all samples > together.This is my code: > def balance4Single(dataSet: DataFrame): DataFrame = { > val samples = LabelConf.cardIDList.map { cardID => > val tmpDataSet = dataSet.filter(col("card_id") === cardID) > val sample = underSample(tmpDataSet, cardID) > sample > } > samples.reduce((x, y) => x.unionAll(y)) > } > def underSample(dataSet: DataFrame, cardID: String): DataFrame = { > val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1) > val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1) > positiveSample.unionAll(negativeSample).distinct() > } > But the code blocked in {{samples.reduce((x, y) => x.unionAll(y))}}, and it > runs slowly and slowly, and even cannot run any more.It confused me a long > time.Who can help me? Than you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24078) reduce with unionAll takes a long time
[ https://issues.apache.org/jira/browse/SPARK-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangsongcheng updated SPARK-24078: --- Description: I try to sample the traning sets with each category,and then uion all samples together.This is my code: {{ def balanceCategory(dataSet: DataFrame): DataFrame = }} {{{}} val samples = LabelConf.categories.map { category => {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} val sample = underSample(tmpDataSet, category) sample } {{ samples.reduce((x, y) => x.unionAll(y))}} } {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1)}} {{ val positiveSample.unionAll(negativeSample)}} } But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! was: I try to sample the traning sets with each category,and then uion all samples together.This is my code: {{ def balanceCategory(dataSet: DataFrame): DataFrame = {}} {{ val samples = LabelConf.categorys.map { }}category => {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} val sample = underSample(tmpDataSet, category) sample } {{ samples.reduce((x, y) => x.unionAll(y))}} } {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1)}} {{ val positiveSample.unionAll(negativeSample)}} } But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! > reduce with unionAll takes a long time > -- > > Key: SPARK-24078 > URL: https://issues.apache.org/jira/browse/SPARK-24078 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.3 >Reporter: zhangsongcheng >Priority: Major > > I try to sample the traning sets with each category,and then uion all samples > together.This is my code: > {{ def balanceCategory(dataSet: DataFrame): DataFrame = }} > {{{}} > val samples = LabelConf.categories.map { > category => > {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} > val sample = underSample(tmpDataSet, category) > sample > } > {{ samples.reduce((x, y) => x.unionAll(y))}} > } > > {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { > val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} > {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, > 0.1)}} > {{ val positiveSample.unionAll(negativeSample)}} > } > > But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and > it runs slowly and slowly, and even cannot run any more.It confused me a long > time.Who can help me? Than you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24078) reduce with unionAll takes a long time
[ https://issues.apache.org/jira/browse/SPARK-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangsongcheng updated SPARK-24078: --- Description: I try to sample the traning sets with each category,and then uion all samples together.This is my code: {{ def balanceCategory(dataSet: DataFrame): DataFrame = {}} {{ val samples = LabelConf.categorys.map { }}category => {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} val sample = underSample(tmpDataSet, category) sample } {{ samples.reduce((x, y) => x.unionAll(y))}} } {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1)}} {{ val positiveSample.unionAll(negativeSample)}} } But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! was: I try to sample the traning sets with each category,and then uion all samples together.This is my code: {{ def balanceCategory(dataSet: DataFrame): DataFrame = {}} {{ val samples = LabelConf.categorys.map { }}{{category => }} {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} {{ val sample = underSample(tmpDataSet, category) sample }} {{ } }} {{ samples.reduce((x, y) => x.unionAll(y))}} {{ } }} {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, 0.1)}} {{ val positiveSample.unionAll(negativeSample)}} } But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and it runs slowly and slowly, and even cannot run any more.It confused me a long time.Who can help me? Than you! > reduce with unionAll takes a long time > -- > > Key: SPARK-24078 > URL: https://issues.apache.org/jira/browse/SPARK-24078 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.3 >Reporter: zhangsongcheng >Priority: Major > > I try to sample the traning sets with each category,and then uion all samples > together.This is my code: > {{ def balanceCategory(dataSet: DataFrame): DataFrame = {}} > {{ val samples = LabelConf.categorys.map { }}category => > {{ val tmpDataSet = dataSet.filter(col("category_id") === category)}} > val sample = underSample(tmpDataSet, category) > sample > } > {{ samples.reduce((x, y) => x.unionAll(y))}} > } > > {{ def underSample(dataSet: DataFrame, cardID: String): DataFrame = { > val positiveSample = dataSet.filter(col("label") > 0.5).sample(false, 0.1)}} > {{ val negativeSample = dataSet.filter(col("label") < 0.5).sample(false, > 0.1)}} > {{ val positiveSample.unionAll(negativeSample)}} > } > > But the code blocked in `{{samples.reduce((x, y) => x.unionAll(y))`}}, and > it runs slowly and slowly, and even cannot run any more.It confused me a long > time.Who can help me? Than you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org