Re: [VOTE] Release Apache AsterixDB 0.9.4.1 and Hyracks 0.3.4.1 (RC2)
[X] +1 release these packages as Apache AsterixDB 0.9.4.1 and Apache Hyracks 0.3.4.1 - Verified the SHA256 signatures - Verified the source builds - Verified the binary by executing a metadata query on the Web interface Best, Taewoo On Thu, Feb 21, 2019 at 7:24 PM Xikui Wang wrote: > [X] +1 release these packages as Apache AsterixDB 0.9.4.1 and > Apache Hyracks 0.3.4.1 > > - Verified the sha256 > - Tested Twitter feed with drop-in dependencies > > On Sat, Feb 16, 2019 at 12:23 PM Mike Carey wrote: > > > [X] +1 release these packages as Apache AsterixDB 0.9.4.1 and Apache > > Hyracks 0.3.4.1 > > > > (I downloaded and verified the NCService puzzle piece and it worked like > a > > charm.) > > > > On 2/15/19 12:03 PM, Ian Maxon wrote: > > > Hi everyone, > > > > > > Please verify and vote on the latest release of Apache AsterixDB > > > > > > The change that produced this release and the change to advance the > > version > > > are > > > up for review on Gerrit: > > > > > > > > > https://asterix-gerrit.ics.uci.edu/#/q/status:open+owner:%22Jenkins+%253Cjenkins%2540fulliautomatix.ics.uci.edu%253E%22 > > > > > > The release artifacts are as follows: > > > > > > AsterixDB Source > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.asc > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.sha256 > > > > > > SHA1:8bdb79294f20ff0140ea46b4a6acf5b787ac1ff3423ec41d5c5c8cdec275000c > > > > > > Hyracks Source > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.asc > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.sha256 > > > > > > SHA1:163a879031a270b0a1d5202247d478c7788ac0a5c704c7fb87d515337df54610 > > > > > > AsterixDB NCService Installer: > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.asc > > > > > > https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.sha256 > > > > > > SHA1:a3961f32aed8283af3cd7b66309770a5cabff426020c9c4a5b699273ad1fa820 > > > > > > The KEYS file containing the PGP keys used to sign the release can be > > > found at > > > > > > https://dist.apache.org/repos/dist/release/asterixdb/KEYS > > > > > > RAT was executed as part of Maven via the RAT maven plugin, but > > > excludes files that are: > > > > > > - data for tests > > > - procedurally generated, > > > - or source files which come without a header mentioning their license, > > >but have an explicit reference in the LICENSE file. > > > > > > > > > The vote is open for 72 hours, or until the necessary number of votes > > > (3 +1) has been reached. > > > > > > Please vote > > > [ ] +1 release these packages as Apache AsterixDB 0.9.4.1 and > > > Apache Hyracks 0.3.4.1 > > > [ ] 0 No strong feeling either way > > > [ ] -1 do not release one or both packages because ... > > > > > > Thanks! > > > > > >
Re: [VOTE] Release Apache AsterixDB 0.9.4.1 and Hyracks 0.3.4.1 (RC2)
[X] +1 release these packages as Apache AsterixDB 0.9.4.1 and Apache Hyracks 0.3.4.1 - Verified the sha256 - Tested Twitter feed with drop-in dependencies On Sat, Feb 16, 2019 at 12:23 PM Mike Carey wrote: > [X] +1 release these packages as Apache AsterixDB 0.9.4.1 and Apache > Hyracks 0.3.4.1 > > (I downloaded and verified the NCService puzzle piece and it worked like a > charm.) > > On 2/15/19 12:03 PM, Ian Maxon wrote: > > Hi everyone, > > > > Please verify and vote on the latest release of Apache AsterixDB > > > > The change that produced this release and the change to advance the > version > > are > > up for review on Gerrit: > > > > > https://asterix-gerrit.ics.uci.edu/#/q/status:open+owner:%22Jenkins+%253Cjenkins%2540fulliautomatix.ics.uci.edu%253E%22 > > > > The release artifacts are as follows: > > > > AsterixDB Source > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.asc > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.sha256 > > > > SHA1:8bdb79294f20ff0140ea46b4a6acf5b787ac1ff3423ec41d5c5c8cdec275000c > > > > Hyracks Source > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.asc > > > https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.sha256 > > > > SHA1:163a879031a270b0a1d5202247d478c7788ac0a5c704c7fb87d515337df54610 > > > > AsterixDB NCService Installer: > > > https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip > > > https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.asc > > > https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.sha256 > > > > SHA1:a3961f32aed8283af3cd7b66309770a5cabff426020c9c4a5b699273ad1fa820 > > > > The KEYS file containing the PGP keys used to sign the release can be > > found at > > > > https://dist.apache.org/repos/dist/release/asterixdb/KEYS > > > > RAT was executed as part of Maven via the RAT maven plugin, but > > excludes files that are: > > > > - data for tests > > - procedurally generated, > > - or source files which come without a header mentioning their license, > >but have an explicit reference in the LICENSE file. > > > > > > The vote is open for 72 hours, or until the necessary number of votes > > (3 +1) has been reached. > > > > Please vote > > [ ] +1 release these packages as Apache AsterixDB 0.9.4.1 and > > Apache Hyracks 0.3.4.1 > > [ ] 0 No strong feeling either way > > [ ] -1 do not release one or both packages because ... > > > > Thanks! > > >
Re: Important to Use a Separate Disk for Logging on SSDs
I think that mailing lists can be configured to allow attachments, but apparently this list is not configured that way. Gerald's suggestion sounds good :) Cheers, Till On 21 Feb 2019, at 10:29, Gerald Sangudi wrote: > I believe the Apache mailer does not allow attachments, or at least images > as attachments. I could be wrong, but I think I saw that elsewhere. > > If necessary, you can post the image somewhere (e.g. Google drive) and > email out a link. > > -Gerald > > > On Thu, Feb 21, 2019 at 10:27 AM Chen Luo wrote: > >> I re-attached the image as follows. In case it still doesn't show up, the >> average point lookup throughput of* SSD for LSM + Logging* is only around >> *3-4k/s*. When a separate hard disk is used for logging, the average >> point lookup throughput reaches *30k-40k/s*. >> >> [image: image.png] >> >> Best regards, >> Chen Luo >> >> On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi >> wrote: >> >>> Thanks for sharing Chen, very interesting. >>> >>> The image doesn't show up for me. Not sure if it shows up for others? >>> >>> Cheers, >>> Abdullah. >>> >>> On Wed, Feb 20, 2019 at 1:29 PM Chen Luo wrote: >>> Hi Devs, Recently I've been running experiments with concurrent ingestions and queries on SSDs. I'd like to share an important lesson from my >>> experiments. In short,* it is very important (from the performance perspective) to >>> use a separate disk for logging, even SSDs are good at random I/Os*. The following experiment illustrates this point. I was using YCSB with 100GB base data (100M records, each has 1KB). During each experiment, >>> there was a constant data arrival process of 3600 records/s. I executed concurrent point lookups (uniformly distributed) as much as possible >>> using 16 query threads (to saturate the disk). The page size was set to 4KB. >>> The experiments were performed on SSDs. The only difference is that one experiment had a separate hard disk for logging, while the other used >>> the same SSD for both LSM and logging. The point lookup throughput over time was plotted below. The negative impact of logging is huge! [image: image.png] The reason is that logging needs to frequently force disk writes (in >>> this experiment, the log flusher forces 70-80 times per second). Even though >>> the disk bandwidth used by the log flusher is small (4-5MB/s), the frequent disk forces could seriously impact the overall disk throughput. If you >>> have a workload with concurrent data ingestion and queries, please DO >>> consider using a separate disk for logging to fully utilize the SSD bandwidth. Best regards, Chen Luo >>> >>
Re: Important to Use a Separate Disk for Logging on SSDs
I believe the Apache mailer does not allow attachments, or at least images as attachments. I could be wrong, but I think I saw that elsewhere. If necessary, you can post the image somewhere (e.g. Google drive) and email out a link. -Gerald On Thu, Feb 21, 2019 at 10:27 AM Chen Luo wrote: > I re-attached the image as follows. In case it still doesn't show up, the > average point lookup throughput of* SSD for LSM + Logging* is only around > *3-4k/s*. When a separate hard disk is used for logging, the average > point lookup throughput reaches *30k-40k/s*. > > [image: image.png] > > Best regards, > Chen Luo > > On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi > wrote: > >> Thanks for sharing Chen, very interesting. >> >> The image doesn't show up for me. Not sure if it shows up for others? >> >> Cheers, >> Abdullah. >> >> On Wed, Feb 20, 2019 at 1:29 PM Chen Luo wrote: >> >> > Hi Devs, >> > >> > Recently I've been running experiments with concurrent ingestions and >> > queries on SSDs. I'd like to share an important lesson from my >> experiments. >> > In short,* it is very important (from the performance perspective) to >> use >> > a separate disk for logging, even SSDs are good at random I/Os*. >> > >> > The following experiment illustrates this point. I was using YCSB with >> > 100GB base data (100M records, each has 1KB). During each experiment, >> there >> > was a constant data arrival process of 3600 records/s. I executed >> > concurrent point lookups (uniformly distributed) as much as possible >> using >> > 16 query threads (to saturate the disk). The page size was set to 4KB. >> The >> > experiments were performed on SSDs. The only difference is that one >> > experiment had a separate hard disk for logging, while the other used >> the >> > same SSD for both LSM and logging. The point lookup throughput over time >> > was plotted below. The negative impact of logging is huge! >> > >> > [image: image.png] >> > >> > The reason is that logging needs to frequently force disk writes (in >> this >> > experiment, the log flusher forces 70-80 times per second). Even though >> the >> > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent >> > disk forces could seriously impact the overall disk throughput. If you >> have >> > a workload with concurrent data ingestion and queries, please DO >> consider >> > using a separate disk for logging to fully utilize the SSD bandwidth. >> > >> > Best regards, >> > Chen Luo >> > >> >
Re: Important to Use a Separate Disk for Logging on SSDs
I re-attached the image as follows. In case it still doesn't show up, the average point lookup throughput of* SSD for LSM + Logging* is only around *3-4k/s*. When a separate hard disk is used for logging, the average point lookup throughput reaches *30k-40k/s*. [image: image.png] Best regards, Chen Luo On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi wrote: > Thanks for sharing Chen, very interesting. > > The image doesn't show up for me. Not sure if it shows up for others? > > Cheers, > Abdullah. > > On Wed, Feb 20, 2019 at 1:29 PM Chen Luo wrote: > > > Hi Devs, > > > > Recently I've been running experiments with concurrent ingestions and > > queries on SSDs. I'd like to share an important lesson from my > experiments. > > In short,* it is very important (from the performance perspective) to use > > a separate disk for logging, even SSDs are good at random I/Os*. > > > > The following experiment illustrates this point. I was using YCSB with > > 100GB base data (100M records, each has 1KB). During each experiment, > there > > was a constant data arrival process of 3600 records/s. I executed > > concurrent point lookups (uniformly distributed) as much as possible > using > > 16 query threads (to saturate the disk). The page size was set to 4KB. > The > > experiments were performed on SSDs. The only difference is that one > > experiment had a separate hard disk for logging, while the other used the > > same SSD for both LSM and logging. The point lookup throughput over time > > was plotted below. The negative impact of logging is huge! > > > > [image: image.png] > > > > The reason is that logging needs to frequently force disk writes (in this > > experiment, the log flusher forces 70-80 times per second). Even though > the > > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent > > disk forces could seriously impact the overall disk throughput. If you > have > > a workload with concurrent data ingestion and queries, please DO consider > > using a separate disk for logging to fully utilize the SSD bandwidth. > > > > Best regards, > > Chen Luo > > >
Re: Important to Use a Separate Disk for Logging on SSDs
Thanks for sharing Chen, very interesting. The image doesn't show up for me. Not sure if it shows up for others? Cheers, Abdullah. On Wed, Feb 20, 2019 at 1:29 PM Chen Luo wrote: > Hi Devs, > > Recently I've been running experiments with concurrent ingestions and > queries on SSDs. I'd like to share an important lesson from my experiments. > In short,* it is very important (from the performance perspective) to use > a separate disk for logging, even SSDs are good at random I/Os*. > > The following experiment illustrates this point. I was using YCSB with > 100GB base data (100M records, each has 1KB). During each experiment, there > was a constant data arrival process of 3600 records/s. I executed > concurrent point lookups (uniformly distributed) as much as possible using > 16 query threads (to saturate the disk). The page size was set to 4KB. The > experiments were performed on SSDs. The only difference is that one > experiment had a separate hard disk for logging, while the other used the > same SSD for both LSM and logging. The point lookup throughput over time > was plotted below. The negative impact of logging is huge! > > [image: image.png] > > The reason is that logging needs to frequently force disk writes (in this > experiment, the log flusher forces 70-80 times per second). Even though the > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent > disk forces could seriously impact the overall disk throughput. If you have > a workload with concurrent data ingestion and queries, please DO consider > using a separate disk for logging to fully utilize the SSD bandwidth. > > Best regards, > Chen Luo >