[Problem Solved]Re: Spark partition size tuning
Hi, dears, the problem has been solved. I mistakely use tachyon.user.block.size.bytes instead of tachyon.user.block.size.bytes.default. It works now. Sorry for the confusion and thanks again to Gene! Best Regards, Jia On Wed, Jan 27, 2016 at 4:59 AM, Jia Zou wrote: > Hi, Gene, > > Thanks for your suggestion. > However, even if I set tachyon.user.block.size.bytes=134217728, and I can > see that from the web console, the files that I load to Tachyon via > copyToLocal, still has 512MB block size. > Do you have more suggestions? > > Best Regards, > Jia > > On Tue, Jan 26, 2016 at 11:46 PM, Gene Pang wrote: > >> Hi Jia, >> >> If you want to change the Tachyon block size, you can set the >> tachyon.user.block.size.bytes.default parameter ( >> http://tachyon-project.org/documentation/Configuration-Settings.html). >> You can set it via extraJavaOptions per job, or adding it to >> tachyon-site.properties. >> >> I hope that helps, >> Gene >> >> On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou wrote: >> >>> Dear all, >>> >>> First to update that the local file system data partition size can be >>> tuned by: >>> sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize) >>> >>> However, I also need to tune Spark data partition size for input data >>> that is stored in Tachyon (default is 512MB), but above method can't work >>> for Tachyon data. >>> >>> Do you have any suggestions? Thanks very much! >>> >>> Best Regards, >>> Jia >>> >>> >>> -- Forwarded message -- >>> From: Jia Zou >>> Date: Thu, Jan 21, 2016 at 10:05 PM >>> Subject: Spark partition size tuning >>> To: "user @spark" >>> >>> >>> Dear all! >>> >>> When using Spark to read from local file system, the default partition >>> size is 32MB, how can I increase the partition size to 128MB, to reduce the >>> number of tasks? >>> >>> Thank you very much! >>> >>> Best Regards, >>> Jia >>> >>> >> >
Re: Spark partition size tuning
Hi, Gene, Thanks for your suggestion. However, even if I set tachyon.user.block.size.bytes=134217728, and I can see that from the web console, the files that I load to Tachyon via copyToLocal, still has 512MB block size. Do you have more suggestions? Best Regards, Jia On Tue, Jan 26, 2016 at 11:46 PM, Gene Pang wrote: > Hi Jia, > > If you want to change the Tachyon block size, you can set the > tachyon.user.block.size.bytes.default parameter ( > http://tachyon-project.org/documentation/Configuration-Settings.html). > You can set it via extraJavaOptions per job, or adding it to > tachyon-site.properties. > > I hope that helps, > Gene > > On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou wrote: > >> Dear all, >> >> First to update that the local file system data partition size can be >> tuned by: >> sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize) >> >> However, I also need to tune Spark data partition size for input data >> that is stored in Tachyon (default is 512MB), but above method can't work >> for Tachyon data. >> >> Do you have any suggestions? Thanks very much! >> >> Best Regards, >> Jia >> >> >> -- Forwarded message -- >> From: Jia Zou >> Date: Thu, Jan 21, 2016 at 10:05 PM >> Subject: Spark partition size tuning >> To: "user @spark" >> >> >> Dear all! >> >> When using Spark to read from local file system, the default partition >> size is 32MB, how can I increase the partition size to 128MB, to reduce the >> number of tasks? >> >> Thank you very much! >> >> Best Regards, >> Jia >> >> >
Re: Spark partition size tuning
Hi Jia, If you want to change the Tachyon block size, you can set the tachyon.user.block.size.bytes.default parameter ( http://tachyon-project.org/documentation/Configuration-Settings.html). You can set it via extraJavaOptions per job, or adding it to tachyon-site.properties. I hope that helps, Gene On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou wrote: > Dear all, > > First to update that the local file system data partition size can be > tuned by: > sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize) > > However, I also need to tune Spark data partition size for input data that > is stored in Tachyon (default is 512MB), but above method can't work for > Tachyon data. > > Do you have any suggestions? Thanks very much! > > Best Regards, > Jia > > > -- Forwarded message -- > From: Jia Zou > Date: Thu, Jan 21, 2016 at 10:05 PM > Subject: Spark partition size tuning > To: "user @spark" > > > Dear all! > > When using Spark to read from local file system, the default partition > size is 32MB, how can I increase the partition size to 128MB, to reduce the > number of tasks? > > Thank you very much! > > Best Regards, > Jia > >
Re: Spark partition size tuning
Hi, May be *sc.hadoopConfiguration.setInt( "dfs.blocksize", blockSize ) *helps you Best Regards, Pavel On Tue, Jan 26, 2016 at 7:13 AM Jia Zou wrote: > Dear all, > > First to update that the local file system data partition size can be > tuned by: > sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize) > > However, I also need to tune Spark data partition size for input data that > is stored in Tachyon (default is 512MB), but above method can't work for > Tachyon data. > > Do you have any suggestions? Thanks very much! > > Best Regards, > Jia > > > -- Forwarded message ------ > From: Jia Zou > Date: Thu, Jan 21, 2016 at 10:05 PM > Subject: Spark partition size tuning > To: "user @spark" > > > Dear all! > > When using Spark to read from local file system, the default partition > size is 32MB, how can I increase the partition size to 128MB, to reduce the > number of tasks? > > Thank you very much! > > Best Regards, > Jia > >
Fwd: Spark partition size tuning
Dear all, First to update that the local file system data partition size can be tuned by: sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize) However, I also need to tune Spark data partition size for input data that is stored in Tachyon (default is 512MB), but above method can't work for Tachyon data. Do you have any suggestions? Thanks very much! Best Regards, Jia -- Forwarded message -- From: Jia Zou Date: Thu, Jan 21, 2016 at 10:05 PM Subject: Spark partition size tuning To: "user @spark" Dear all! When using Spark to read from local file system, the default partition size is 32MB, how can I increase the partition size to 128MB, to reduce the number of tasks? Thank you very much! Best Regards, Jia
Spark partition size tuning
Dear all! When using Spark to read from local file system, the default partition size is 32MB, how can I increase the partition size to 128MB, to reduce the number of tasks? Thank you very much! Best Regards, Jia