short-circuit local reads cannot be used
Hi, Everytime I start the spark-shell I encounter this message: 14/11/18 00:27:43 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Any idea how to overcome it ? the short-circuit feature is a big perfomance boost I don't want to lose.. Thanks, Daniel
Re: Short Circuit Local Reads
On Tue, Sep 30, 2014 at 6:28 PM, Andrew Ash wrote: > Thanks for the research Kay! > > It does seem addressed, and hopefully fixed in that ticket conversation also > in https://issues.apache.org/jira/browse/HDFS-4697 So the best thing here > is to wait to upgrade to a version of Hadoop that has that fix and then > repeating the test right now. That will be quite a while for me (at least > early 2015) but I'd be interested in hearing people who are already on CDH5+ > attempting to replicate the above experiment. If you want to test the remote read path on cdh4 without readahead, you can set both dfs.datanode.readahead.bytes and dfs.client.cache.readahead to 0. This might help give a fairer comparison with short-circuit. SCR also maintains a cache of recently used file descriptors whose size is specified by dfs.client.read.shortcircuit.streams.cache.size. You could try increasing this number and see if it helps at all. In CDH4, it was set at a relatively low 100, and the cache expiry time (specified by dfs.client.read.shortcircuit.streams.cache.expiry.ms) was also set at a relatively low 5000 ms (5 seconds.) So you could try playing with those knobs. When this cache hits, we completely avoid the overhead of passing a file descriptor, calling JNI routines, and opening the file descriptor on the DN side. It would also be interesting to see CPU consumption numbers. In general, one of the benefits of SCR is reduced CPU consumption, which may or may not be a benefit depending on what your job is bottlenecked on. We also find that workloads involving a lot of seeks benefit greatly... the original rationale for SCR was HBase. I would also advise dropping the caches in between doing your experiments using "echo 3 > /proc/sys/vm/drop_caches" You will need to shut everything down first because pages that are in use or dirty are not purged. In general, 17 GB is not a lot of data on a modern machine and I would expect things like VM startup time and what's in the page cache at the beginning to make non-trivial contributions to the numbers unless you are careful. > > Cheers, > Andrew > > On Tue, Sep 30, 2014 at 2:26 PM, Kay Ousterhout > wrote: >> >> Hi Andrew and Gary, >> >> I've done some experimentation with this and had similar results. I can't >> explain the speedup in write performance, but I dug into the read slowdown >> and found that enabling short-circuit reads results in Hadoop not doing >> read-ahead in the same way. At a high level, when SCR is off, HDFS does >> read-ahead on input data, so much of the time spent reading input data is >> pipelined with computation. There were some bugs with SCR where, when SCR >> was turned on, reading no longer got pipelined, slowing down performance. >> In particular, I believe that non-shortcircuited-reads use fadvise to tell >> the OS to read the file in the background, which is not done with shirt >> circuit reads. It's not fadvise, but it is the "readahead" system call on Linux. Since it is a blocking system call, we need worker threads to do this in the background. We don't use the "readahead" system call for short-circuit reads on cdh5. Part of the reason that hasn't been implemented yet is that one of the main advantages of short-circuit is reduced CPU consumption, and we felt spawning more threads might cut into that. We could implement it pretty easily if people wanted it, but the biggest users of SCR (HBase, Impala) have not requested it yet, so we haven't yet. best, Colin >> This problem is partially described in >> https://issues.apache.org/jira/browse/HDFS-5634, a seemingly unrelated JIRA >> that mentions this way down in some of comments. This was supposedly fixed >> in newer versions of Hadoop but I haven't verified it. >> >> -Kay >> >>> >>> >>> -- Forwarded message -- >>> From: Andrew Ash >>> Date: Tue, Sep 30, 2014 at 1:33 PM >>> Subject: Re: Short Circuit Local Reads >>> To: Matei Zaharia >>> Cc: "user@spark.apache.org" , Gary Malouf >>> >>> >>> >>> Hi Gary, >>> >>> I gave this a shot on a test cluster of CDH4.7 and actually saw a >>> regression in performance when running the numbers. Have you done any >>> benchmarking? Below are my numbers: >>> >>> >>> >>> Experimental method: >>> 1. Write 14GB of data to HDFS via [1] >>> 2. Read data multiple times via [2] >>> >>> >>> Experiment 1: run on virtual machines >>> >>> >>> With short-circuit read disabled: >>> 14/09/24 15:10:49 INFO spark.SparkContext: Job
Re: Short Circuit Local Reads
Thanks for the research Kay! It does seem addressed, and hopefully fixed in that ticket conversation also in https://issues.apache.org/jira/browse/HDFS-4697 So the best thing here is to wait to upgrade to a version of Hadoop that has that fix and then repeating the test right now. That will be quite a while for me (at least early 2015) but I'd be interested in hearing people who are already on CDH5+ attempting to replicate the above experiment. Cheers, Andrew On Tue, Sep 30, 2014 at 2:26 PM, Kay Ousterhout wrote: > Hi Andrew and Gary, > > I've done some experimentation with this and had similar results. I can't > explain the speedup in write performance, but I dug into the read slowdown > and found that enabling short-circuit reads results in Hadoop not doing > read-ahead in the same way. At a high level, when SCR is off, HDFS does > read-ahead on input data, so much of the time spent reading input data is > pipelined with computation. There were some bugs with SCR where, when SCR > was turned on, reading no longer got pipelined, slowing down performance. > In particular, I believe that non-shortcircuited-reads use fadvise to tell > the OS to read the file in the background, which is not done with shirt > circuit reads. This problem is partially described in > https://issues.apache.org/jira/browse/HDFS-5634, a seemingly unrelated > JIRA that mentions this way down in some of comments. This was supposedly > fixed in newer versions of Hadoop but I haven't verified it. > > -Kay > > >> >> -- Forwarded message ------ >> From: Andrew Ash >> Date: Tue, Sep 30, 2014 at 1:33 PM >> Subject: Re: Short Circuit Local Reads >> To: Matei Zaharia >> Cc: "user@spark.apache.org" , Gary Malouf >> >> >> >> Hi Gary, >> >> I gave this a shot on a test cluster of CDH4.7 and actually saw a >> regression in performance when running the numbers. Have you done any >> benchmarking? Below are my numbers: >> >> >> >> Experimental method: >> 1. Write 14GB of data to HDFS via [1] >> 2. Read data multiple times via [2] >> >> >> Experiment 1: run on virtual machines >> >> >> With short-circuit read disabled: >> 14/09/24 15:10:49 INFO spark.SparkContext: Job finished: >> saveAsTextFile at :13, took 344.931469949 s >> 14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at >> :13, took 18.601568871 s >> 14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at >> :13, took 16.531909024 s >> 14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at >> :13, took 17.639692651 s >> 14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at >> :13, took 16.773438345 s >> >> With short-circuit read enabled: >> 14/09/24 14:28:38 INFO spark.SparkContext: Job finished: >> saveAsTextFile at :13, took 299.511103592 s >> 14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at >> :13, took 22.459146194 s >> 14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at >> :13, took 19.806642815 s >> 14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at >> :13, took 20.284644308 s >> 14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at >> :13, took 21.720455219 s >> >> >> My summary hear is that enabling short-circuit read caused the write >> to go faster (what?) and caused a slight decrease in read performance, >> from ~17sec to ~20sec. >> >> The VMs were backed by FusionIO drives but I thought maybe there was >> something funky with the VMs so switched to bare hardware in a second >> experiment. >> >> >> Experiment 2: run on bare hardware >> >> With short-circuit read disabled: >> 14/09/24 15:59:11 INFO spark.SparkContext: Job finished: >> saveAsTextFile at :13, took 1605.965203162 s >> 14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at >> :13, took 11.984355461 s >> 14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at >> :13, took 11.134712764 s >> 14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at >> :13, took 8.694292372 s >> 14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at >> :13, took 9.83986823 s >> >> With short-circuit read enabled: >> 14/09/24 16:23:14 INFO spark.SparkContext: Job finished: >> saveAsTextFile at :13, took 1113.897715871 s >> 14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at >> :13, took 14.249690605 s >> 14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at >> :13, took 12.673301
Re: Short Circuit Local Reads
Hi Andrew and Gary, I've done some experimentation with this and had similar results. I can't explain the speedup in write performance, but I dug into the read slowdown and found that enabling short-circuit reads results in Hadoop not doing read-ahead in the same way. At a high level, when SCR is off, HDFS does read-ahead on input data, so much of the time spent reading input data is pipelined with computation. There were some bugs with SCR where, when SCR was turned on, reading no longer got pipelined, slowing down performance. In particular, I believe that non-shortcircuited-reads use fadvise to tell the OS to read the file in the background, which is not done with shirt circuit reads. This problem is partially described in https://issues.apache.org/jira/browse/HDFS-5634, a seemingly unrelated JIRA that mentions this way down in some of comments. This was supposedly fixed in newer versions of Hadoop but I haven't verified it. -Kay > > -- Forwarded message -- > From: Andrew Ash > Date: Tue, Sep 30, 2014 at 1:33 PM > Subject: Re: Short Circuit Local Reads > To: Matei Zaharia > Cc: "user@spark.apache.org" , Gary Malouf > > > > Hi Gary, > > I gave this a shot on a test cluster of CDH4.7 and actually saw a > regression in performance when running the numbers. Have you done any > benchmarking? Below are my numbers: > > > > Experimental method: > 1. Write 14GB of data to HDFS via [1] > 2. Read data multiple times via [2] > > > Experiment 1: run on virtual machines > > > With short-circuit read disabled: > 14/09/24 15:10:49 INFO spark.SparkContext: Job finished: > saveAsTextFile at :13, took 344.931469949 s > 14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at > :13, took 18.601568871 s > 14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at > :13, took 16.531909024 s > 14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at > :13, took 17.639692651 s > 14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at > :13, took 16.773438345 s > > With short-circuit read enabled: > 14/09/24 14:28:38 INFO spark.SparkContext: Job finished: > saveAsTextFile at :13, took 299.511103592 s > 14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at > :13, took 22.459146194 s > 14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at > :13, took 19.806642815 s > 14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at > :13, took 20.284644308 s > 14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at > :13, took 21.720455219 s > > > My summary hear is that enabling short-circuit read caused the write > to go faster (what?) and caused a slight decrease in read performance, > from ~17sec to ~20sec. > > The VMs were backed by FusionIO drives but I thought maybe there was > something funky with the VMs so switched to bare hardware in a second > experiment. > > > Experiment 2: run on bare hardware > > With short-circuit read disabled: > 14/09/24 15:59:11 INFO spark.SparkContext: Job finished: > saveAsTextFile at :13, took 1605.965203162 s > 14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at > :13, took 11.984355461 s > 14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at > :13, took 11.134712764 s > 14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at > :13, took 8.694292372 s > 14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at > :13, took 9.83986823 s > > With short-circuit read enabled: > 14/09/24 16:23:14 INFO spark.SparkContext: Job finished: > saveAsTextFile at :13, took 1113.897715871 s > 14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at > :13, took 14.249690605 s > 14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at > :13, took 12.67330165 s > 14/09/24 16:26:04 INFO spark.SparkContext: Job finished: count at > :13, took 10.673825924 s > 14/09/24 16:26:19 INFO spark.SparkContext: Job finished: count at > :13, took 9.722516379 s > > > This is separate hardware so the numbers are very different (it's not > just bypassing the VM overhead). > > Again, the writes are much faster (1605s -> 1113s) but the reads are > comparable if not slightly slower (~10.4s -> ~11.8s) > > > > > To make sure that short circuit reads were actually working I looked > at the datanode logs and saw the following line. I think this > confirms that a) the read was local (127.0.0.1 -> 127.0.0.1) from > Spark and b) short-circuit read was successfully used ("success: > true"). > > hadoop-datanode-mybox.local.log:2014-09-24 16:26:52,800 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace
Re: Short Circuit Local Reads
Hi Gary, I gave this a shot on a test cluster of CDH4.7 and actually saw a regression in performance when running the numbers. Have you done any benchmarking? Below are my numbers: Experimental method: 1. Write 14GB of data to HDFS via [1] 2. Read data multiple times via [2] *Experiment 1: run on virtual machines* With short-circuit read *disabled*: 14/09/24 15:10:49 INFO spark.SparkContext: Job finished: saveAsTextFile at :13, took 344.931469949 s 14/09/24 15:11:30 INFO spark.SparkContext: Job finished: count at :13, took 18.601568871 s 14/09/24 15:11:54 INFO spark.SparkContext: Job finished: count at :13, took 16.531909024 s 14/09/24 15:12:18 INFO spark.SparkContext: Job finished: count at :13, took 17.639692651 s 14/09/24 15:12:38 INFO spark.SparkContext: Job finished: count at :13, took 16.773438345 s With short-circuit read *enabled*: 14/09/24 14:28:38 INFO spark.SparkContext: Job finished: saveAsTextFile at :13, took 299.511103592 s 14/09/24 14:29:17 INFO spark.SparkContext: Job finished: count at :13, took 22.459146194 s 14/09/24 14:29:44 INFO spark.SparkContext: Job finished: count at :13, took 19.806642815 s 14/09/24 14:30:11 INFO spark.SparkContext: Job finished: count at :13, took 20.284644308 s 14/09/24 14:30:40 INFO spark.SparkContext: Job finished: count at :13, took 21.720455219 s My summary hear is that enabling short-circuit read caused the write to go faster (what?) and caused a slight decrease in read performance, from ~17sec to ~20sec. The VMs were backed by FusionIO drives but I thought maybe there was something funky with the VMs so switched to bare hardware in a second experiment. *Experiment 2: run on bare hardware* With short-circuit read *disabled*: 14/09/24 15:59:11 INFO spark.SparkContext: Job finished: saveAsTextFile at :13, took 1605.965203162 s 14/09/24 15:59:39 INFO spark.SparkContext: Job finished: count at :13, took 11.984355461 s 14/09/24 16:00:00 INFO spark.SparkContext: Job finished: count at :13, took 11.134712764 s 14/09/24 16:00:11 INFO spark.SparkContext: Job finished: count at :13, took 8.694292372 s 14/09/24 16:00:24 INFO spark.SparkContext: Job finished: count at :13, took 9.83986823 s With short-circuit read *enabled*: 14/09/24 16:23:14 INFO spark.SparkContext: Job finished: saveAsTextFile at :13, took 1113.897715871 s 14/09/24 16:25:19 INFO spark.SparkContext: Job finished: count at :13, took 14.249690605 s 14/09/24 16:25:47 INFO spark.SparkContext: Job finished: count at :13, took 12.67330165 s 14/09/24 16:26:04 INFO spark.SparkContext: Job finished: count at :13, took 10.673825924 s 14/09/24 16:26:19 INFO spark.SparkContext: Job finished: count at :13, took 9.722516379 s This is separate hardware so the numbers are very different (it's not just bypassing the VM overhead). Again, the writes are much faster (1605s -> 1113s) but the reads are comparable if not slightly slower (~10.4s -> ~11.8s) To make sure that short circuit reads were actually working I looked at the datanode logs and saw the following line. I think this confirms that a) the read was local (127.0.0.1 -> 127.0.0.1) from Spark and b) short-circuit read was successfully used ("success: true"). hadoop-datanode-mybox.local.log:2014-09-24 16:26:52,800 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: -312380305519226759, srvID: DS-96112752-10.201.12.105-50010-1411586696381, success: true Has anyone actually deployed this feature and benchmarked gains? I was hoping to throw this switch on my clusters and get a 30% perf boost but in practice that has not materialized. Cheers! Andrew [1] sc.parallelize(1 to (14*1024*1024)).map(k => Seq(k, org.apache.commons.lang.RandomStringUtils.random(1024, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).saveAsTextFile("hdfs:///tmp/output") [2] sc.textFile("hdfs:///tmp/output").count On Wed, Sep 17, 2014 at 11:19 AM, Matei Zaharia wrote: > I'm pretty sure it does help, though I don't have any numbers for it. In > any case, Spark will automatically benefit from this if you link it to a > version of HDFS that contains this. > > Matei > > On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.g...@gmail.com) > wrote: > > Cloudera had a blog post about this in August 2013: > http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/ > > Has anyone been using this in production - curious as to if it made a > significant difference from a Spark perspective. > >
Re: Short Circuit Local Reads
I'm pretty sure it does help, though I don't have any numbers for it. In any case, Spark will automatically benefit from this if you link it to a version of HDFS that contains this. Matei On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.g...@gmail.com) wrote: Cloudera had a blog post about this in August 2013: http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/ Has anyone been using this in production - curious as to if it made a significant difference from a Spark perspective.
Short Circuit Local Reads
Cloudera had a blog post about this in August 2013: http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/ Has anyone been using this in production - curious as to if it made a significant difference from a Spark perspective.