[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265167#comment-16265167
 ] 

Steve Loughran commented on SPARK-22526:


it says it in the javadocs for [PortableInputStream.open()|: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/input/PortableDataStream.scala#L176].
 I don't see how it could be made much clearer there. I think it's implicitly 
"obvious" that streams need closing, because its considered standard in all 
java/scala code. You either make sure the code you call does it (docs or diving 
into it through the IDE), or you do it yourself. As close() is required to be 
idempotent, you can call it on a closed stream, so you may as well.

# There's nothing in {{SparkContext.binaryFiles()}}...you could maybe supply a 
mention there.
# there's only one use of PortableInputStream.open() which I can see in the 
code" {{PortableInputStream.toArray()}}; it does do a close() in finally. Those 
test ones are more verifying path resolution, except for those which use 
toArray, which does, as noted, close things.

I'm sorry you feel let down by the framework, but there's nothing which can be 
done in the code. At some point in the future someone could add something about 
binaryFiles to the markdown docs...perhaps, once you've used it more, you could 
be the volunteer?

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264931#comment-16264931
 ] 

mohamed imran commented on SPARK-22526:
---

[~ste...@apache.org] Thanks. But i have referred some of the examples given in 
the spark git. None of the code says "Close the inputstream explicitly using 
sc.binaryfiles"

Sample program from Git:-

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala

So kindly update the spark document with the clear syntax. it will avoid wrong 
assumption/understanding between the developers.

FYI..I haven't tested the code yet. I will test and keep you posted in sometime.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264496#comment-16264496
 ] 

Steve Loughran commented on SPARK-22526:


I'm not giving a permanent fix. It's a bug in your code or the code you are 
invoking "forgets to close input stream"

Unless [~srowen] has other ideas, I'd recommend as closing as one of 
{WORKSFORME, WONTFIX or INVALID}

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264208#comment-16264208
 ] 

mohamed imran commented on SPARK-22526:
---

[~ste...@apache.org] Thanks steve. I will modify my code as you mentioned in 
the JIRA and test it. If it works fine, i can use the same until you give the 
permanent fix!

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264095#comment-16264095
 ] 

Steve Loughran commented on SPARK-22526:


# Fix the code you invoke
#. wrap the code you  invoke with something like (and this is coded in the 
JIRA, untested & should really close the stream in something to swallow IOEs.

{code}
binaryRdd.map { t =>
  try {
process(t._2)
  } finally {
t._2.close()
  }
}
{code}


> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-22 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263097#comment-16263097
 ] 

mohamed imran commented on SPARK-22526:
---

[~ste...@apache.org] Yes. Thats sounds sense to me. Is there any  workaround 
for this?

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263021#comment-16263021
 ] 

Steve Loughran commented on SPARK-22526:


If the input stream doesn't get closed, there probably is going to be a GET 
kept open on every request. So I'd make sure that whatever you are running in 
your RDD does this. As the docs of {{PortableDataStream.open()}} say "The user 
of this method is responsible for closing the stream after usage."

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-22 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262489#comment-16262489
 ] 

mohamed imran commented on SPARK-22526:
---

[~ste...@apache.org] Thanks for your suggestions.

I did set the fs.s3a.connection.maximum value upto 1 and checked the 
connections using netstat -a  | grep CLOSE_WAIT  while processing the files 
from S3. Every read of each files ,connection pool value gets increased but it 
never closed the connections.

Due to which at some point of time It gets hang indefinitely.

Anyhow as per your suggestion, I will upgrade my hadoop-2.7.3 version to 
hadoop-2.8 and take the stats! keep you posted.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261292#comment-16261292
 ] 

Steve Loughran commented on SPARK-22526:


S3a uses the AWS S3 client, which uses httpclient inside. S3AInputStream 
absolutely closes that stream on close(), it does a lot of work to use abort 
when it's considered more expensive to read to the end of the GET.  I wouldn't 
jump to blame the httpclient.

However, AWS S3 does http connection recycling pooling, set in 
{{fs.s3a.connection.maximum}}; default is 15. There might be some blocking 
waiting for free connections, so try a larger value. 

Now, what's needed to track things down? You get to do it yourself, at least 
for now, as you are the only person reporting this.

I'd go for
* getting the thread dump as things block & see what the threads are up to
* use "netstat -p tcp" to see what network connections are live
* logging {{org.apache.hadoop.fs.s3a}} and seeing what it says. 

Before doing any of that: move off 2.7 to using the Hadoop 2.8 binaries, Hadoop 
2.8 has a lot of performance and functionality improvements which will never be 
backported to 2.7.x, including updated AWS SDK libraries.

If you do find a problem in the s3a client/AWS SDK, the response to a HADOOP- 
JIRA will be "does it go away if you upgrade?". Save time by doing that before 
anything else.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257247#comment-16257247
 ] 

mohamed imran commented on SPARK-22526:
---

[~srowen] Yes I agree that it is an issue with the http api which spark is 
using to read files from S3. I don't see any other issues apart from the 
httpclient. But as a whole its an issue related to Spark framework which uses 
this http client. You need to look at the http client successful open and close 
connection while using S3 API.

By default Spark 2.2.0 uses httpclient-4.5.2.jar  httpcore-4.4.4.jar.

I suspect this http jars has some issues while connecting to S3 . Please look 
at it.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257230#comment-16257230
 ] 

Sean Owen commented on SPARK-22526:
---

It could be a problem with S3, or the S3 API. You haven't shown what 
connections aren't getting closed. I'm saying this is not likely an issue with 
Spark itself.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257224#comment-16257224
 ] 

mohamed imran commented on SPARK-22526:
---

[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>

filepath = x.getAs("filepath")
ziprdd = sc.binaryfiles(s"$filepath") // filename will be test.zip(example)
ziprdd.count

}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256986#comment-16256986
 ] 

Sean Owen commented on SPARK-22526:
---

You're only showing a read of one file, a .zip file. I don't see any Avro 
files, or 100 files here. I'd have to close this.

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253819#comment-16253819
 ] 

mohamed imran commented on SPARK-22526:
---

Nope. Actually i am reading some zip files using the sc.binaryfiles from spark. 
But after some 100th file read it gets hangs indefinitely. I was monitoring the 
opened connection of every spark binary file read. None of the connections were 
closed after every read of each binary files. Yes I am sure it is a not a 
network or S3 issue. 

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253820#comment-16253820
 ] 

mohamed imran commented on SPARK-22526:
---

Nope. Actually i am reading some zip files using the sc.binaryfiles from spark. 
But after some 100th file read it gets hangs indefinitely. I was monitoring the 
opened connection of every spark binary file read. None of the connections were 
closed after every read of each binary files. Yes I am sure it is a not a 
network or S3 issue. 

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253694#comment-16253694
 ] 

Sean Owen commented on SPARK-22526:
---

It's not clear what you're reporting. Are you saying it's an Avro issue? what 
fixed it when? What is not closing what connection? are you sure it's not a 
network or S3 issue?

> Spark hangs while reading binary files from S3
> --
>
> Key: SPARK-22526
> URL: https://issues.apache.org/jira/browse/SPARK-22526
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org