[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: {{var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...} ) } hiveService.insertRDD(filteredRDD.repartition(10), outTable)}} been other way. When i got StackOverflowError after many iteration spark {{java.lang.StackOverflowError at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}} \{{}} \{{}} How i must build my code with repartitional and persist\coalesce for to nodes not crashes? I tried to rebuild the program in different ways, transferring repartitioning and saving in memory / disk inside the loop, installed a large number of partitions - 200. The program either hangs on the “repartition” stage or crashes into error code 143 (outOfMemory), throwing a stackOverflowError in a strange way. was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: {{var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer){ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...}) } hiveService.insertRDD(filteredRDD.repartition(10), outTable)}} been other way. When i got StackOverflowError after many iteration spark {{java.lang.StackOverflowError at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}} \{{}} \{{}} How i must build my code with repartitional and persist\coalesce for to nodes not crashes? I tried to rebuild the program in different ways, transferring repartitioning and saving in memory / disk inside the loop, installed a large number of partitions - 200. The program either hangs on the “repartition” stage or crashes into error code 143 (outOfMemory), throwing a stackOverflowError in a strange way. > Task not started after
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: {{var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer){ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...}) } hiveService.insertRDD(filteredRDD.repartition(10), outTable)}} been other way. When i got StackOverflowError after many iteration spark {{java.lang.StackOverflowError at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}} \{{}} \{{}} How i must build my code with repartitional and persist\coalesce for to nodes not crashes? I tried to rebuild the program in different ways, transferring repartitioning and saving in memory / disk inside the loop, installed a large number of partitions - 200. The program either hangs on the “repartition” stage or crashes into error code 143 (outOfMemory), throwing a stackOverflowError in a strange way. was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: {{var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer)\{ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(someRDD.filter(row => {...}) } hiveService.insertRDD(filteredRDD.repartition(10), outTable)}} {{}} {{}} been other way. When i got StackOverflowError after many iteration spark {{java.lang.StackOverflowError at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}} {{}} {{}} How i must build my code with repartitional and persist\coalesce for to nodes not crashes? I tried to rebuild the program in different ways, transferring repartitioning and saving in memory / disk inside the loop, installed a large number of partitions - 200. The program either hangs on the “repartition” stage or crashes into error code 143 (outOfMemory), throwing a
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: {{var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer)\{ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(someRDD.filter(row => {...}) } hiveService.insertRDD(filteredRDD.repartition(10), outTable)}} {{}} {{}} been other way. When i got StackOverflowError after many iteration spark {{java.lang.StackOverflowError at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}} {{}} {{}} How i must build my code with repartitional and persist\coalesce for to nodes not crashes? I tried to rebuild the program in different ways, transferring repartitioning and saving in memory / disk inside the loop, installed a large number of partitions - 200. The program either hangs on the “repartition” stage or crashes into error code 143 (outOfMemory), throwing a stackOverflowError in a strange way. was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(filteringRDD(someRDD )) } hiveService.insertRDD(filteredRDD.repartition(10), outTable) > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png|width=644,height=150! > > !mgg1s.png|width=651,height=182! > > This my code: > {{var filteredRDD = sparkContext.emptyRDD[String] > for (path<- pathBuffer)\{ > val someRDD = sparkContext.textFile(path) > if (isValidRDD(someRDD)) >filteredRDD = filteredRDD.++(someRDD.filter(row => {...}) > } > hiveService.insertRDD(filteredRDD.repartition(10), outTable)}} > {{}} > {{}} > been other way. When i got StackOverflowError after many iteration spark > > {{java.lang.StackOverflowError > at > java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303) > at > java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596) > at > java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at >
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(filteringRDD(someRDD )) } hiveService.insertRDD(filteredRDD.repartition(10), outTable) was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(filterRDD(someRDD )) } hiveService.insertRDD(filteredRDD.repartition(10), outTable) > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png|width=644,height=150! > > !mgg1s.png|width=651,height=182! > > This my code: > var filteredRDD = sparkContext.emptyRDD[String] > for (path<- pathBuffer) > { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) > filteredRDD = filteredRDD.++(filteringRDD(someRDD )) } > hiveService.insertRDD(filteredRDD.repartition(10), outTable) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(filterRDD(someRDD )) } hiveService.insertRDD(filteredRDD.repartition(10), outTable) was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(filterRDD(someRDD )) } hiveService.insertRDD(filteredRDD.repartition(10), outTable) > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png|width=644,height=150! > > !mgg1s.png|width=651,height=182! > > This my code: > var filteredRDD = sparkContext.emptyRDD[String] > for (path<- pathBuffer) > { val someRDD = sparkContext.textFile(path) > if (isValidRDD(someRDD)) > filteredRDD = filteredRDD.++(filterRDD(someRDD )) > } > hiveService.insertRDD(filteredRDD.repartition(10), outTable) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var filteredRDD = sparkContext.emptyRDD[String] for (path<- pathBuffer) { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(filterRDD(someRDD )) } hiveService.insertRDD(filteredRDD.repartition(10), outTable) was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) { logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData) > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png|width=644,height=150! > > !mgg1s.png|width=651,height=182! > > This my code: > var filteredRDD = sparkContext.emptyRDD[String] > for (path<- pathBuffer) { > val someRDD = sparkContext.textFile(path) > if (isValidRDD(someRDD)) > filteredRDD = filteredRDD.++(filterRDD(someRDD )) > } > hiveService.insertRDD(filteredRDD.repartition(10), outTable) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png! !mgg1s.png! This my code: var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) { logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData) was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !image-2020-12-01-13-34-17-283.png! !image-2020-12-01-13-34-31-288.png! This my code: {{var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) \{ logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData)}} > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png! > > !mgg1s.png! > > This my code: > var allTrafficRDD = sparkContext.emptyRDD[String] > for (traffic <- trafficBuffer) { > logger.info("Load traffic path - "+traffic) > val trafficRDD = sparkContext.textFile(traffic) > if (isValidTraffic(trafficRDD, isMasterData)) > { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } > } > > hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), > outTable, isMasterData) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) { logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData) was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png! !mgg1s.png! This my code: var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) { logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData) > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png|width=644,height=150! > > !mgg1s.png|width=651,height=182! > > This my code: > var allTrafficRDD = sparkContext.emptyRDD[String] > for (traffic <- trafficBuffer) { > logger.info("Load traffic path - "+traffic) > val trafficRDD = sparkContext.textFile(traffic) > if (isValidTraffic(trafficRDD, isMasterData)) > { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } > } > > hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), > outTable, isMasterData) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Attachment: mgg1s.png > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !image-2020-12-01-13-34-17-283.png! > !image-2020-12-01-13-34-31-288.png! > > This my code: > > {{var allTrafficRDD = sparkContext.emptyRDD[String] > for (traffic <- trafficBuffer) \{ > logger.info("Load traffic path - "+traffic) > val trafficRDD = sparkContext.textFile(traffic) > if (isValidTraffic(trafficRDD, isMasterData)) { > allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) > } > } > > hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), > outTable, isMasterData)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Sterkhov updated SPARK-33620: --- Attachment: VlwWJ.png > Task not started after filtering > > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Vladislav Sterkhov >Priority: Major > Attachments: VlwWJ.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !image-2020-12-01-13-34-17-283.png! > !image-2020-12-01-13-34-31-288.png! > > This my code: > > {{var allTrafficRDD = sparkContext.emptyRDD[String] > for (traffic <- trafficBuffer) \{ > logger.info("Load traffic path - "+traffic) > val trafficRDD = sparkContext.textFile(traffic) > if (isValidTraffic(trafficRDD, isMasterData)) { > allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) > } > } > > hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), > outTable, isMasterData)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org