[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-02 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 
 This my code:
  {{var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer)

{ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) 
filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...}

) 
 }
 hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}

 

been other way. When i got StackOverflowError after many iteration spark

 
  {{java.lang.StackOverflowError
 at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
 at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
 at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}}
 \{{}}
 \{{}}
 How i must build my code with repartitional and persist\coalesce for to nodes 
not crashes?

I tried to rebuild the program in different ways, transferring repartitioning 
and saving in memory / disk inside the loop, installed a large number of 
partitions - 200.

The program either hangs on the “repartition” stage or crashes into error code 
143 (outOfMemory), throwing a stackOverflowError in a strange way.

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 
 This my code:
  {{var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer){ 
 val someRDD = sparkContext.textFile(path) 
 if (isValidRDD(someRDD))
 filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...}) 
 }
 hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}

 


 been other way. When i got StackOverflowError after many iteration spark

 
  {{java.lang.StackOverflowError
 at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
 at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
 at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}}
 \{{}}
 \{{}}
 How i must build my code with repartitional and persist\coalesce for to nodes 
not crashes?

I tried to rebuild the program in different ways, transferring repartitioning 
and saving in memory / disk inside the loop, installed a large number of 
partitions - 200.

The program either hangs on the “repartition” stage or crashes into error code 
143 (outOfMemory), throwing a stackOverflowError in a strange way.


> Task not started after 

[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-02 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 
 This my code:
  {{var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer){ 
 val someRDD = sparkContext.textFile(path) 
 if (isValidRDD(someRDD))
 filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...}) 
 }
 hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}

 


 been other way. When i got StackOverflowError after many iteration spark

 
  {{java.lang.StackOverflowError
 at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
 at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
 at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}}
 \{{}}
 \{{}}
 How i must build my code with repartitional and persist\coalesce for to nodes 
not crashes?

I tried to rebuild the program in different ways, transferring repartitioning 
and saving in memory / disk inside the loop, installed a large number of 
partitions - 200.

The program either hangs on the “repartition” stage or crashes into error code 
143 (outOfMemory), throwing a stackOverflowError in a strange way.

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 
This my code:
 {{var filteredRDD = sparkContext.emptyRDD[String]
for (path<- pathBuffer)\{ 
val someRDD = sparkContext.textFile(path) 
if (isValidRDD(someRDD))
   filteredRDD = filteredRDD.++(someRDD.filter(row => {...}) 
}
hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}
{{}}
{{}}
been other way. When i got StackOverflowError after many iteration spark

 
 {{java.lang.StackOverflowError
at 
java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}}
{{}}
{{}}
How i must build my code with repartitional and persist\coalesce for to nodes 
not crashes?

I tried to rebuild the program in different ways, transferring repartitioning 
and saving in memory / disk inside the loop, installed a large number of 
partitions - 200.

The program either hangs on the “repartition” stage or crashes into error code 
143 (outOfMemory), throwing a 

[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-02 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 
This my code:
 {{var filteredRDD = sparkContext.emptyRDD[String]
for (path<- pathBuffer)\{ 
val someRDD = sparkContext.textFile(path) 
if (isValidRDD(someRDD))
   filteredRDD = filteredRDD.++(someRDD.filter(row => {...}) 
}
hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}
{{}}
{{}}
been other way. When i got StackOverflowError after many iteration spark

 
 {{java.lang.StackOverflowError
at 
java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}}
{{}}
{{}}
How i must build my code with repartitional and persist\coalesce for to nodes 
not crashes?

I tried to rebuild the program in different ways, transferring repartitioning 
and saving in memory / disk inside the loop, installed a large number of 
partitions - 200.

The program either hangs on the “repartition” stage or crashes into error code 
143 (outOfMemory), throwing a stackOverflowError in a strange way.

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer)

{ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD))       
filteredRDD = filteredRDD.++(filteringRDD(someRDD )) }

hiveService.insertRDD(filteredRDD.repartition(10), outTable)


> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png|width=644,height=150!
>  
> !mgg1s.png|width=651,height=182!
>  
> This my code:
>  {{var filteredRDD = sparkContext.emptyRDD[String]
> for (path<- pathBuffer)\{ 
> val someRDD = sparkContext.textFile(path) 
> if (isValidRDD(someRDD))
>filteredRDD = filteredRDD.++(someRDD.filter(row => {...}) 
> }
> hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}
> {{}}
> {{}}
> been other way. When i got StackOverflowError after many iteration spark
>  
>  {{java.lang.StackOverflowError
> at 
> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
> at 
> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
> at 
> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at 
> 

[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer)

{ val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD))       
filteredRDD = filteredRDD.++(filteringRDD(someRDD )) }

hiveService.insertRDD(filteredRDD.repartition(10), outTable)

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer)

{ val someRDD = sparkContext.textFile(path)

if (isValidRDD(someRDD))

      filteredRDD = filteredRDD.++(filterRDD(someRDD ))

}

hiveService.insertRDD(filteredRDD.repartition(10), outTable)


> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png|width=644,height=150!
>  
> !mgg1s.png|width=651,height=182!
>  
> This my code:
> var filteredRDD = sparkContext.emptyRDD[String]
>  for (path<- pathBuffer)
> { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD))       
> filteredRDD = filteredRDD.++(filteringRDD(someRDD )) }
> hiveService.insertRDD(filteredRDD.repartition(10), outTable)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer)

{ val someRDD = sparkContext.textFile(path)

if (isValidRDD(someRDD))

      filteredRDD = filteredRDD.++(filterRDD(someRDD ))

}

hiveService.insertRDD(filteredRDD.repartition(10), outTable)

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer) {
 val someRDD = sparkContext.textFile(path)
 if (isValidRDD(someRDD))

      filteredRDD = filteredRDD.++(filterRDD(someRDD ))

}
 hiveService.insertRDD(filteredRDD.repartition(10), outTable)


> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png|width=644,height=150!
>  
> !mgg1s.png|width=651,height=182!
>  
> This my code:
> var filteredRDD = sparkContext.emptyRDD[String]
>  for (path<- pathBuffer)
> { val someRDD = sparkContext.textFile(path)
> if (isValidRDD(someRDD))
>       filteredRDD = filteredRDD.++(filterRDD(someRDD ))
> }
> hiveService.insertRDD(filteredRDD.repartition(10), outTable)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var filteredRDD = sparkContext.emptyRDD[String]
 for (path<- pathBuffer) {
 val someRDD = sparkContext.textFile(path)
 if (isValidRDD(someRDD))

      filteredRDD = filteredRDD.++(filterRDD(someRDD ))

}
 hiveService.insertRDD(filteredRDD.repartition(10), outTable)

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var allTrafficRDD = sparkContext.emptyRDD[String]
 for (traffic <- trafficBuffer) {
 logger.info("Load traffic path - "+traffic)
 val trafficRDD = sparkContext.textFile(traffic)
 if (isValidTraffic(trafficRDD, isMasterData))

{ allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }

}
 
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)


> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png|width=644,height=150!
>  
> !mgg1s.png|width=651,height=182!
>  
> This my code:
> var filteredRDD = sparkContext.emptyRDD[String]
>  for (path<- pathBuffer) {
>  val someRDD = sparkContext.textFile(path)
>  if (isValidRDD(someRDD))
>       filteredRDD = filteredRDD.++(filterRDD(someRDD ))
> }
>  hiveService.insertRDD(filteredRDD.repartition(10), outTable)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png!

 

!mgg1s.png!

 

This my code:

var allTrafficRDD = sparkContext.emptyRDD[String]
 for (traffic <- trafficBuffer) {
 logger.info("Load traffic path - "+traffic)
 val trafficRDD = sparkContext.textFile(traffic)
 if (isValidTraffic(trafficRDD, isMasterData))

{ allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }

}
 
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!image-2020-12-01-13-34-17-283.png!


!image-2020-12-01-13-34-31-288.png!

 

This my code:

 

{{var allTrafficRDD = sparkContext.emptyRDD[String]
for (traffic <- trafficBuffer) \{
  logger.info("Load traffic path - "+traffic)
  val trafficRDD = sparkContext.textFile(traffic)
  if (isValidTraffic(trafficRDD, isMasterData)) {
allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD))
  }
}

hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)}}


> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png!
>  
> !mgg1s.png!
>  
> This my code:
> var allTrafficRDD = sparkContext.emptyRDD[String]
>  for (traffic <- trafficBuffer) {
>  logger.info("Load traffic path - "+traffic)
>  val trafficRDD = sparkContext.textFile(traffic)
>  if (isValidTraffic(trafficRDD, isMasterData))
> { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }
> }
>  
> hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
>  outTable, isMasterData)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var allTrafficRDD = sparkContext.emptyRDD[String]
 for (traffic <- trafficBuffer) {
 logger.info("Load traffic path - "+traffic)
 val trafficRDD = sparkContext.textFile(traffic)
 if (isValidTraffic(trafficRDD, isMasterData))

{ allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }

}
 
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png!

 

!mgg1s.png!

 

This my code:

var allTrafficRDD = sparkContext.emptyRDD[String]
 for (traffic <- trafficBuffer) {
 logger.info("Load traffic path - "+traffic)
 val trafficRDD = sparkContext.textFile(traffic)
 if (isValidTraffic(trafficRDD, isMasterData))

{ allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }

}
 
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)


> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png|width=644,height=150!
>  
> !mgg1s.png|width=651,height=182!
>  
> This my code:
> var allTrafficRDD = sparkContext.emptyRDD[String]
>  for (traffic <- trafficBuffer) {
>  logger.info("Load traffic path - "+traffic)
>  val trafficRDD = sparkContext.textFile(traffic)
>  if (isValidTraffic(trafficRDD, isMasterData))
> { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }
> }
>  
> hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
>  outTable, isMasterData)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Attachment: mgg1s.png

> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !image-2020-12-01-13-34-17-283.png!
> !image-2020-12-01-13-34-31-288.png!
>  
> This my code:
>  
> {{var allTrafficRDD = sparkContext.emptyRDD[String]
> for (traffic <- trafficBuffer) \{
>   logger.info("Load traffic path - "+traffic)
>   val trafficRDD = sparkContext.textFile(traffic)
>   if (isValidTraffic(trafficRDD, isMasterData)) {
> allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD))
>   }
> }
> 
> hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
>  outTable, isMasterData)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33620) Task not started after filtering

2020-12-01 Thread Vladislav Sterkhov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Sterkhov updated SPARK-33620:
---
Attachment: VlwWJ.png

> Task not started after filtering
> 
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Vladislav Sterkhov
>Priority: Major
> Attachments: VlwWJ.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !image-2020-12-01-13-34-17-283.png!
> !image-2020-12-01-13-34-31-288.png!
>  
> This my code:
>  
> {{var allTrafficRDD = sparkContext.emptyRDD[String]
> for (traffic <- trafficBuffer) \{
>   logger.info("Load traffic path - "+traffic)
>   val trafficRDD = sparkContext.textFile(traffic)
>   if (isValidTraffic(trafficRDD, isMasterData)) {
> allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD))
>   }
> }
> 
> hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
>  outTable, isMasterData)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org