Re: [VOTE] Release Apache AsterixDB 0.9.4.1 and Hyracks 0.3.4.1 (RC2)

2019-02-21 Thread Taewoo Kim
[X] +1 release these packages as Apache AsterixDB 0.9.4.1 and
Apache Hyracks 0.3.4.1

- Verified the SHA256 signatures
- Verified the source builds
- Verified the binary by executing a metadata query on the Web interface

Best,
Taewoo


On Thu, Feb 21, 2019 at 7:24 PM Xikui Wang  wrote:

> [X] +1 release these packages as Apache AsterixDB 0.9.4.1 and
> Apache Hyracks 0.3.4.1
>
> - Verified the sha256
> - Tested Twitter feed with drop-in dependencies
>
> On Sat, Feb 16, 2019 at 12:23 PM Mike Carey  wrote:
>
> > [X] +1 release these packages as Apache AsterixDB 0.9.4.1 and Apache
> > Hyracks 0.3.4.1
> >
> > (I downloaded and verified the NCService puzzle piece and it worked like
> a
> > charm.)
> >
> > On 2/15/19 12:03 PM, Ian Maxon wrote:
> > > Hi everyone,
> > >
> > > Please verify and vote on the latest release of Apache AsterixDB
> > >
> > > The change that produced this release and the change to advance the
> > version
> > > are
> > > up for review on Gerrit:
> > >
> > >
> >
> https://asterix-gerrit.ics.uci.edu/#/q/status:open+owner:%22Jenkins+%253Cjenkins%2540fulliautomatix.ics.uci.edu%253E%22
> > >
> > > The release artifacts are as follows:
> > >
> > > AsterixDB Source
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.asc
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.sha256
> > >
> > > SHA1:8bdb79294f20ff0140ea46b4a6acf5b787ac1ff3423ec41d5c5c8cdec275000c
> > >
> > > Hyracks Source
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.asc
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.sha256
> > >
> > > SHA1:163a879031a270b0a1d5202247d478c7788ac0a5c704c7fb87d515337df54610
> > >
> > > AsterixDB NCService Installer:
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.asc
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.sha256
> > >
> > > SHA1:a3961f32aed8283af3cd7b66309770a5cabff426020c9c4a5b699273ad1fa820
> > >
> > > The KEYS file containing the PGP keys used to sign the release can be
> > > found at
> > >
> > > https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> > >
> > > RAT was executed as part of Maven via the RAT maven plugin, but
> > > excludes files that are:
> > >
> > > - data for tests
> > > - procedurally generated,
> > > - or source files which come without a header mentioning their license,
> > >but have an explicit reference in the LICENSE file.
> > >
> > >
> > > The vote is open for 72 hours, or until the necessary number of votes
> > > (3 +1) has been reached.
> > >
> > > Please vote
> > > [ ] +1 release these packages as Apache AsterixDB 0.9.4.1 and
> > > Apache Hyracks 0.3.4.1
> > > [ ] 0 No strong feeling either way
> > > [ ] -1 do not release one or both packages because ...
> > >
> > > Thanks!
> > >
> >
>


Re: [VOTE] Release Apache AsterixDB 0.9.4.1 and Hyracks 0.3.4.1 (RC2)

2019-02-21 Thread Xikui Wang
[X] +1 release these packages as Apache AsterixDB 0.9.4.1 and
Apache Hyracks 0.3.4.1

- Verified the sha256
- Tested Twitter feed with drop-in dependencies

On Sat, Feb 16, 2019 at 12:23 PM Mike Carey  wrote:

> [X] +1 release these packages as Apache AsterixDB 0.9.4.1 and Apache
> Hyracks 0.3.4.1
>
> (I downloaded and verified the NCService puzzle piece and it worked like a
> charm.)
>
> On 2/15/19 12:03 PM, Ian Maxon wrote:
> > Hi everyone,
> >
> > Please verify and vote on the latest release of Apache AsterixDB
> >
> > The change that produced this release and the change to advance the
> version
> > are
> > up for review on Gerrit:
> >
> >
> https://asterix-gerrit.ics.uci.edu/#/q/status:open+owner:%22Jenkins+%253Cjenkins%2540fulliautomatix.ics.uci.edu%253E%22
> >
> > The release artifacts are as follows:
> >
> > AsterixDB Source
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.4.1-source-release.zip.sha256
> >
> > SHA1:8bdb79294f20ff0140ea46b4a6acf5b787ac1ff3423ec41d5c5c8cdec275000c
> >
> > Hyracks Source
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.4.1-source-release.zip.sha256
> >
> > SHA1:163a879031a270b0a1d5202247d478c7788ac0a5c704c7fb87d515337df54610
> >
> > AsterixDB NCService Installer:
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.4.1-binary-assembly.zip.sha256
> >
> > SHA1:a3961f32aed8283af3cd7b66309770a5cabff426020c9c4a5b699273ad1fa820
> >
> > The KEYS file containing the PGP keys used to sign the release can be
> > found at
> >
> > https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> >
> > RAT was executed as part of Maven via the RAT maven plugin, but
> > excludes files that are:
> >
> > - data for tests
> > - procedurally generated,
> > - or source files which come without a header mentioning their license,
> >but have an explicit reference in the LICENSE file.
> >
> >
> > The vote is open for 72 hours, or until the necessary number of votes
> > (3 +1) has been reached.
> >
> > Please vote
> > [ ] +1 release these packages as Apache AsterixDB 0.9.4.1 and
> > Apache Hyracks 0.3.4.1
> > [ ] 0 No strong feeling either way
> > [ ] -1 do not release one or both packages because ...
> >
> > Thanks!
> >
>


Re: Important to Use a Separate Disk for Logging on SSDs

2019-02-21 Thread Till Westmann
I think that mailing lists can be configured to allow attachments, but
apparently this list is not configured that way.

Gerald's suggestion sounds good :)

Cheers,
Till

On 21 Feb 2019, at 10:29, Gerald Sangudi wrote:

> I believe the Apache mailer does not allow attachments, or at least images
> as attachments. I could be wrong, but I think I saw that elsewhere.
>
> If necessary, you can post the image somewhere (e.g. Google drive)  and
> email out a link.
>
> -Gerald
>
>
> On Thu, Feb 21, 2019 at 10:27 AM Chen Luo  wrote:
>
>> I re-attached the image as follows. In case it still doesn't show up, the
>> average point lookup throughput of* SSD for LSM + Logging* is only around
>> *3-4k/s*. When a separate hard disk is used for logging, the average
>> point lookup throughput reaches *30k-40k/s*.
>>
>> [image: image.png]
>>
>> Best regards,
>> Chen Luo
>>
>> On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi 
>> wrote:
>>
>>> Thanks for sharing Chen, very interesting.
>>>
>>> The image doesn't show up for me. Not sure if it shows up for others?
>>>
>>> Cheers,
>>> Abdullah.
>>>
>>> On Wed, Feb 20, 2019 at 1:29 PM Chen Luo  wrote:
>>>
 Hi Devs,

 Recently I've been running experiments with concurrent ingestions and
 queries on SSDs. I'd like to share an important lesson from my
>>> experiments.
 In short,* it is very important (from the performance perspective) to
>>> use
 a separate disk for logging, even SSDs are good at random I/Os*.

 The following experiment illustrates this point. I was using YCSB with
 100GB base data (100M records, each has 1KB). During each experiment,
>>> there
 was a constant data arrival process of 3600 records/s. I executed
 concurrent point lookups (uniformly distributed) as much as possible
>>> using
 16 query threads (to saturate the disk). The page size was set to 4KB.
>>> The
 experiments were performed on SSDs. The only difference is that one
 experiment had a separate hard disk for logging, while the other used
>>> the
 same SSD for both LSM and logging. The point lookup throughput over time
 was plotted below. The negative impact of logging is huge!

 [image: image.png]

 The reason is that logging needs to frequently force disk writes (in
>>> this
 experiment, the log flusher forces 70-80 times per second). Even though
>>> the
 disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
 disk forces could seriously impact the overall disk throughput. If you
>>> have
 a workload with concurrent data ingestion and queries, please DO
>>> consider
 using a separate disk for logging to fully utilize the SSD bandwidth.

 Best regards,
 Chen Luo

>>>
>>


Re: Important to Use a Separate Disk for Logging on SSDs

2019-02-21 Thread Gerald Sangudi
I believe the Apache mailer does not allow attachments, or at least images
as attachments. I could be wrong, but I think I saw that elsewhere.

If necessary, you can post the image somewhere (e.g. Google drive)  and
email out a link.

-Gerald


On Thu, Feb 21, 2019 at 10:27 AM Chen Luo  wrote:

> I re-attached the image as follows. In case it still doesn't show up, the
> average point lookup throughput of* SSD for LSM + Logging* is only around
> *3-4k/s*. When a separate hard disk is used for logging, the average
> point lookup throughput reaches *30k-40k/s*.
>
> [image: image.png]
>
> Best regards,
> Chen Luo
>
> On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi 
> wrote:
>
>> Thanks for sharing Chen, very interesting.
>>
>> The image doesn't show up for me. Not sure if it shows up for others?
>>
>> Cheers,
>> Abdullah.
>>
>> On Wed, Feb 20, 2019 at 1:29 PM Chen Luo  wrote:
>>
>> > Hi Devs,
>> >
>> > Recently I've been running experiments with concurrent ingestions and
>> > queries on SSDs. I'd like to share an important lesson from my
>> experiments.
>> > In short,* it is very important (from the performance perspective) to
>> use
>> > a separate disk for logging, even SSDs are good at random I/Os*.
>> >
>> > The following experiment illustrates this point. I was using YCSB with
>> > 100GB base data (100M records, each has 1KB). During each experiment,
>> there
>> > was a constant data arrival process of 3600 records/s. I executed
>> > concurrent point lookups (uniformly distributed) as much as possible
>> using
>> > 16 query threads (to saturate the disk). The page size was set to 4KB.
>> The
>> > experiments were performed on SSDs. The only difference is that one
>> > experiment had a separate hard disk for logging, while the other used
>> the
>> > same SSD for both LSM and logging. The point lookup throughput over time
>> > was plotted below. The negative impact of logging is huge!
>> >
>> > [image: image.png]
>> >
>> > The reason is that logging needs to frequently force disk writes (in
>> this
>> > experiment, the log flusher forces 70-80 times per second). Even though
>> the
>> > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
>> > disk forces could seriously impact the overall disk throughput. If you
>> have
>> > a workload with concurrent data ingestion and queries, please DO
>> consider
>> > using a separate disk for logging to fully utilize the SSD bandwidth.
>> >
>> > Best regards,
>> > Chen Luo
>> >
>>
>


Re: Important to Use a Separate Disk for Logging on SSDs

2019-02-21 Thread Chen Luo
I re-attached the image as follows. In case it still doesn't show up, the
average point lookup throughput of* SSD for LSM + Logging* is only around
*3-4k/s*. When a separate hard disk is used for logging, the average point
lookup throughput reaches *30k-40k/s*.

[image: image.png]

Best regards,
Chen Luo

On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi 
wrote:

> Thanks for sharing Chen, very interesting.
>
> The image doesn't show up for me. Not sure if it shows up for others?
>
> Cheers,
> Abdullah.
>
> On Wed, Feb 20, 2019 at 1:29 PM Chen Luo  wrote:
>
> > Hi Devs,
> >
> > Recently I've been running experiments with concurrent ingestions and
> > queries on SSDs. I'd like to share an important lesson from my
> experiments.
> > In short,* it is very important (from the performance perspective) to use
> > a separate disk for logging, even SSDs are good at random I/Os*.
> >
> > The following experiment illustrates this point. I was using YCSB with
> > 100GB base data (100M records, each has 1KB). During each experiment,
> there
> > was a constant data arrival process of 3600 records/s. I executed
> > concurrent point lookups (uniformly distributed) as much as possible
> using
> > 16 query threads (to saturate the disk). The page size was set to 4KB.
> The
> > experiments were performed on SSDs. The only difference is that one
> > experiment had a separate hard disk for logging, while the other used the
> > same SSD for both LSM and logging. The point lookup throughput over time
> > was plotted below. The negative impact of logging is huge!
> >
> > [image: image.png]
> >
> > The reason is that logging needs to frequently force disk writes (in this
> > experiment, the log flusher forces 70-80 times per second). Even though
> the
> > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
> > disk forces could seriously impact the overall disk throughput. If you
> have
> > a workload with concurrent data ingestion and queries, please DO consider
> > using a separate disk for logging to fully utilize the SSD bandwidth.
> >
> > Best regards,
> > Chen Luo
> >
>


Re: Important to Use a Separate Disk for Logging on SSDs

2019-02-21 Thread abdullah alamoudi
Thanks for sharing Chen, very interesting.

The image doesn't show up for me. Not sure if it shows up for others?

Cheers,
Abdullah.

On Wed, Feb 20, 2019 at 1:29 PM Chen Luo  wrote:

> Hi Devs,
>
> Recently I've been running experiments with concurrent ingestions and
> queries on SSDs. I'd like to share an important lesson from my experiments.
> In short,* it is very important (from the performance perspective) to use
> a separate disk for logging, even SSDs are good at random I/Os*.
>
> The following experiment illustrates this point. I was using YCSB with
> 100GB base data (100M records, each has 1KB). During each experiment, there
> was a constant data arrival process of 3600 records/s. I executed
> concurrent point lookups (uniformly distributed) as much as possible using
> 16 query threads (to saturate the disk). The page size was set to 4KB. The
> experiments were performed on SSDs. The only difference is that one
> experiment had a separate hard disk for logging, while the other used the
> same SSD for both LSM and logging. The point lookup throughput over time
> was plotted below. The negative impact of logging is huge!
>
> [image: image.png]
>
> The reason is that logging needs to frequently force disk writes (in this
> experiment, the log flusher forces 70-80 times per second). Even though the
> disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
> disk forces could seriously impact the overall disk throughput. If you have
> a workload with concurrent data ingestion and queries, please DO consider
> using a separate disk for logging to fully utilize the SSD bandwidth.
>
> Best regards,
> Chen Luo
>