Re: [DISCUSS] Altering storage write rate limiting, and adding read rate limiting

2023-05-14 Thread Chen Luo
Hi Ian,

I remembered I introduced this rate limiter before :)

For 1 and 2, the original thought was that log writes will be much less
compared with LSM writes (consider every record is written to log once but
will be flushed once and merged multiple times). By bounding flush/merge
rate, it will automatically bound the log write rate too. That being said,
there shouldn't be any issue to rate limit log writes more explicitly,
especially if there are records that are not written to LSMs.

I don't really remember the details of 3. From what I can tell from my
experiments, everything worked expectedly with the previous implementation.
But please do feel free to change it if you think it doesn't make sense
(and run some simple workloads to make sure the change is expected).

Best regards,
Chen Luo

On Sat, May 13, 2023 at 3:35 PM Mike Carey  wrote:

> I've been meaning to reply to this - it's time!  I suspect that #1 is
> just historical, and happened because the limiter's initial focus was on
> keeping LSM activity (in particular big merges) from overrunning the
> other activities in the system.  It seems like this is the right move,
> IMO.  I wouldn't worry about back-compat for this - it seems fine since
> by default this is not enabled (so I don't think we have use cases that
> are depending on it not changing).
>
> On 5/3/23 11:43 AM, Ian Maxon wrote:
> > Hi everyone,
> > I've been working on a patch that adds read rate limiting to AsterixDB so
> > that multiple NCs can share the same disk in a more cooperative fashion.
> > Clearly since we already have write rate limiting, I looked at how that
> is
> > done as a first step. It seemed easy enough to perform the read rate
> > limiting precisely the same way as the write rate limiting, just that it
> > needs to be for all IO, not just LSM IO. So, the best place to put the
> > limiter seemed to be in the IOManager.  Given that, I had a few questions
> > that came to mind:
> >
> > 1. Why is the write-rate limiting in the LSM page write callback
> instead
> > of further down, like in the IOManager?
> > 2. Is there any downside to moving it to the IOManager?
> > 3. Our rate limiter at the moment wraps Guava's RateLimiter. We
> specify
> > in the constructor a maxBurstSeconds argument, but where we give
> this to
> > Guava the meaning is not clear to me. From my reading of the docs
> and the
> > source comments, it seems like what we pass as a burst time is
> actually a
> > warmup time, and that the burst time is fixed to be 1 second within
> Guava's
> > RateLimiter as long as you use public constructors. Is this the case
> or am
> > I misreading something?
> >
> >
> > The main concern I have is with backwards compatibility, because I would
> > like to reuse the current config parameter for write rate limiting to
> just
> > mean all writes going through the IOManager, not just LSM writes. If they
> > both achieve the same purpose then it would be fine, but if they are
> > different then of course something else needs to be done.
> > I have a WIP patch up to illustrate, it has a couple warts but hopefully
> > the idea is clear:https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17497
> >
> > Any thoughts are much appreciated.
> >
> > Thanks,
> >
> > - Ian
> >


Re: Serious Hash Collisions in Hash Join/Groupby

2020-09-26 Thread Chen Luo
This fix is orthogonal to using different hash functions for each join
level. We also need to use a different hash function for level 0 (even when
there is no spilling at all) in order to avoid hash collisions caused by
the hash partitioning operator.

On Sat, Sep 26, 2020 at 5:26 PM Mike Carey  wrote:

> I vaguely remember a similar performance bug years ago -- maybe that was
> w.r.t. the hash function(s) used in the series of levels when joining
> really big data.  There I think we ended up moving through a
> family/series of hash functions.  Does this fix do the same...?  (Just
> wondering if there's already other infrastructure in the code for this.)
>
> On 9/26/20 10:15 AM, Chen Luo wrote:
> > Hi team,
> >
> > Recently I found a serious performance bug for the current hash
> > join/groupby implementation [1] and submitted a fix for it [2]. The
> damage
> > caused by this bug depends on the query itself and the underlying data,
> but
> > for a specific join query TPC-H Q5 whose initial query time was very
> > strange, *the query time was reduced from ~3500s to ~1000s on 16 nodes
> > after applying this fix*.
> >
> > When we perform hash join/groupby, keys are first hash partitioned into
> > each NC partition (hash1(key)%P, where P is the number of NC partitions).
> > Within each NC partition, a hash table is built by hashing each key to a
> > slot (hash2(key)%N, where N is the number of slots). The key problem is
> > that we used the same hash function for both hash1 and hash2! In general,
> > this may lead to a lot of hash collisions. To see this problem, consider
> > what happens with NC partition 0. We know that all keys assigned to NC
> > partition 0 must satisfy hash(key) % P == 0. Now suppose we have 16 NC
> > partitions (P = 16) and N is a multiple of 4. Since hash(key)%16 == 0, we
> > know that hash(key)%N must be a multiple of 4! This means all slots that
> > are multiple of 1,2,3 will be empty, and all keys will be clustered into
> > slots that are multiples of 4. To fix this problem, we can simply use a
> > different hash function for hash join/groupby.
> >
> > If you are running experiments related to join queries and have seen some
> > unexpected performance results, it'll be helpful to try this fix to see
> > what happens.
> >
> > Best regards,
> > Chen Luo
> >
> > [1] https://issues.apache.org/jira/browse/ASTERIXDB-2783
> > [2] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8123
> >
>


Serious Hash Collisions in Hash Join/Groupby

2020-09-26 Thread Chen Luo
Hi team,

Recently I found a serious performance bug for the current hash
join/groupby implementation [1] and submitted a fix for it [2]. The damage
caused by this bug depends on the query itself and the underlying data, but
for a specific join query TPC-H Q5 whose initial query time was very
strange, *the query time was reduced from ~3500s to ~1000s on 16 nodes
after applying this fix*.

When we perform hash join/groupby, keys are first hash partitioned into
each NC partition (hash1(key)%P, where P is the number of NC partitions).
Within each NC partition, a hash table is built by hashing each key to a
slot (hash2(key)%N, where N is the number of slots). The key problem is
that we used the same hash function for both hash1 and hash2! In general,
this may lead to a lot of hash collisions. To see this problem, consider
what happens with NC partition 0. We know that all keys assigned to NC
partition 0 must satisfy hash(key) % P == 0. Now suppose we have 16 NC
partitions (P = 16) and N is a multiple of 4. Since hash(key)%16 == 0, we
know that hash(key)%N must be a multiple of 4! This means all slots that
are multiple of 1,2,3 will be empty, and all keys will be clustered into
slots that are multiples of 4. To fix this problem, we can simply use a
different hash function for hash join/groupby.

If you are running experiments related to join queries and have seen some
unexpected performance results, it'll be helpful to try this fix to see
what happens.

Best regards,
Chen Luo

[1] https://issues.apache.org/jira/browse/ASTERIXDB-2783
[2] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8123


Increasing Default Lock Table Size

2020-09-24 Thread Chen Luo
Hi devs,

Recently I found an issue that our default lock table size (1024) is too
small to avoid hash collisions. Currently, our default log buffer size is
4MB * 8 = 32MB, which means there could be at most 32MB unflushed log
records. Suppose each write produces 100 bytes log records (worse case),
then there could be 320K ongoing operations at most. Thus, in
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8003, I was trying to
increase the default lock table size to 1MB. However, this fails certain
test cases because of out-of-memory.

I'm thinking that maybe we should introduce a new parameter, e.g.,
txn.lock.table.size with default value 1MB, to make the lock size
configurable. We could still use a smaller lock table size to avoid test
case failures. Please let me know your thoughts.

Best regards,
Chen Luo


Suspicious Spikersilk Test Failure

2020-03-27 Thread Chen Luo
Hi devs,

I encountered a suspicious spidersilk test failure (
https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-spidersilk-tests/1672/).
The stack trace is attached below. It seems that the JDK version may be
wrong. Is there any way to fix this? Or can we safely ignore this failure
for now?

Exception in thread "main" java.lang.NoSuchMethodError:
java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
at org.apache.hyracks.ipc.impl.IPCHandle.(IPCHandle.java:55)
at
org.apache.hyracks.ipc.impl.IPCConnectionManager.getIPCHandle(IPCConnectionManager.java:128)
at org.apache.hyracks.ipc.impl.IPCSystem.getHandle(IPCSystem.java:79)
at org.apache.hyracks.ipc.impl.IPCSystem.getHandle(IPCSystem.java:121)
at org.apache.hyracks.ipc.impl.IPCSystem.getHandle(IPCSystem.java:66)
at
org.apache.hyracks.ipc.impl.ReconnectingIPCHandle.(ReconnectingIPCHandle.java:43)
at org.apache.hyracks.ipc.impl.IPCSystem.getHandle(IPCSystem.java:76)
at
org.apache.hyracks.ipc.impl.IPCSystem.getReconnectingHandle(IPCSystem.java:116)
at
org.apache.hyracks.ipc.impl.IPCSystem.getReconnectingHandle(IPCSystem.java:70)
at
org.apache.hyracks.ipc.impl.HyracksConnection.(HyracksConnection.java:114)
at
org.apache.asterix.hyracks.bootstrap.CCApplication.start(CCApplication.java:147)
at
org.apache.hyracks.control.cc.ClusterControllerService.startApplication(ClusterControllerService.java:255)
at
org.apache.hyracks.control.cc.ClusterControllerService.start(ClusterControllerService.java:236)
at org.apache.hyracks.control.cc.CCDriver.main(CCDriver.java:59)

Best regards,
Chen Luo


Re: Checkpoint file not forced to disk

2019-10-28 Thread Chen Luo
Hi Murtadha,

Feel free to abandon my submitted patch. PS - the current transaction log
and LSM disk components also use FileChannel.force() to force data to disk.

Best regards,
Chen Luo

On Mon, Oct 28, 2019 at 6:25 AM Murtadha Hubail  wrote:

> P.S.
> Looks like FileChannel.force() is enough according to [1] and we don't
> need FileDescriptor.sync().
>
> [1]
> https://stackoverflow.com/questions/5650327/are-filechannel-force-and-filedescriptor-sync-both-needed
>
>
> On 10/28/2019, 3:22 PM, "Murtadha Hubail"  wrote:
>
> @Mike Carey,
>
> For this specific system checkpoint file, we have a test that ensures
> even if the file is corrupted, the node will continue its startup and we
> will just force recovery to start from the beginning of the transaction log.
>
> @Chen Luo,
>
> Thanks for checking and highlighting the issue. Looks like forcing a
> file on closing its stream is OS dependent. Apparently, we also need
> FileDescriptor.sync() for non-local storage to guarantee a file is forced.
> We have other metadata files that we need to do this for. If you don't
> mind, I will submit a patch to cover all of these and I will include the
> changes from your patch [1].
>
> Cheers,
> Murtadha
>
> [1] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/3923
>
> On 10/28/2019, 9:56 AM, "Mike Carey"  wrote:
>
> Sounds right ... Explicit call needed!  Better crash testing also
> needed;
> this one should not be impossible to test for.  Do we have any
> crash
> testing that we can extend?
>
> On Sun, Oct 27, 2019, 7:15 PM Chen Luo  wrote:
>
> > HI Murtadha,
> >
> > I don't think closing a file will automatically force it to
> disk. It only
> > forces the buffer from JVM to OS. I did some search online and
> also checked
> > the source code of JDK 8. When a file channel is closed, only
> the close
> > function (provided by Linux) is called [1]. The only place that
> calls fsync
> > is inside the FileChannel.force() method [2]. For correctness, I
> believe
> > force should be called explicitly, as we did when an LSM merge is
> > completed.
> >
> > [1]
> >
> >
> https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L273
> > [2]
> >
> >
> https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L145
> >
> > Best regards,
> > Chen Luo
> >
> > On Sun, Oct 27, 2019 at 5:24 PM Murtadha Hubail <
> hubail...@gmail.com>
> > wrote:
> >
> > > Hi Chen,
> > >
> > > If I'm not mistaken, Files#write ensures that the file is
> closed after
> > > writing the bytes which should flush the file. If not, then we
> probably
> > > should add an explicit flush there.
> > >
> > > Cheers,
> > > Murtadha
> > >
> > > On 10/27/2019, 8:26 PM, "Chen Luo"  wrote:
> > >
> > > Hi devs,
> > >
> > > I noticed the checkpoint file is not forced to disk after
> completion,
> > > but
> > > we still proceed to truncate logs and older checkpoint
> files [1].
> > This
> > > seems to be a bug to me. Also, from my understanding,
> reading the
> > > checkpoint file without forcing to disk will still succeed
> because
> > the
> > > file
> > > can be read from the OS write cache. Is there any other
> > considerations
> > > for
> > > not forcing checkpoint files?
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/asterixdb/blob/2a76a0fe83fc5534c00923cd0f09f8477eac713a/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/AbstractCheckpointManager.java#L176
> > >
> > >
> > >
> > >
> >
>
>
>
>
>


Re: Checkpoint file not forced to disk

2019-10-27 Thread Chen Luo
HI Murtadha,

I don't think closing a file will automatically force it to disk. It only
forces the buffer from JVM to OS. I did some search online and also checked
the source code of JDK 8. When a file channel is closed, only the close
function (provided by Linux) is called [1]. The only place that calls fsync
is inside the FileChannel.force() method [2]. For correctness, I believe
force should be called explicitly, as we did when an LSM merge is completed.

[1]
https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L273
[2]
https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#L145

Best regards,
Chen Luo

On Sun, Oct 27, 2019 at 5:24 PM Murtadha Hubail  wrote:

> Hi Chen,
>
> If I'm not mistaken, Files#write ensures that the file is closed after
> writing the bytes which should flush the file. If not, then we probably
> should add an explicit flush there.
>
> Cheers,
> Murtadha
>
> On 10/27/2019, 8:26 PM, "Chen Luo"  wrote:
>
> Hi devs,
>
> I noticed the checkpoint file is not forced to disk after completion,
> but
> we still proceed to truncate logs and older checkpoint files [1]. This
> seems to be a bug to me. Also, from my understanding, reading the
> checkpoint file without forcing to disk will still succeed because the
> file
> can be read from the OS write cache. Is there any other considerations
> for
> not forcing checkpoint files?
>
>
> [1]
>
> https://github.com/apache/asterixdb/blob/2a76a0fe83fc5534c00923cd0f09f8477eac713a/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/AbstractCheckpointManager.java#L176
>
>
>
>


Checkpoint file not forced to disk

2019-10-27 Thread Chen Luo
Hi devs,

I noticed the checkpoint file is not forced to disk after completion, but
we still proceed to truncate logs and older checkpoint files [1]. This
seems to be a bug to me. Also, from my understanding, reading the
checkpoint file without forcing to disk will still succeed because the file
can be read from the OS write cache. Is there any other considerations for
not forcing checkpoint files?


[1]
https://github.com/apache/asterixdb/blob/2a76a0fe83fc5534c00923cd0f09f8477eac713a/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/recovery/AbstractCheckpointManager.java#L176


Re: Move MergePolicy property validation to policy

2019-06-18 Thread Chen Luo
+1. This new change makes sense to me, especially if we want to add new
merge policies in the future by modularizing each merge policy.

Best regards,
Chen Luo

On Tue, Jun 18, 2019 at 1:46 PM Mike Carey  wrote:

> This sounds like a good evolutionary change to me - others' thoughts?
>
> On 6/18/19 10:09 AM, Merlin Mao wrote:
> > Currently all MergePolicy property names and types are defined in
> > DatasetDeclParametersUtil for validation. It would be better to move them
> > back to the MergePolicy or MergePolicyFactory for mainly 3 reasons:
> >
> > 1. Adding a new MergePolicy or modifying an existing MergePolicy can be
> > more difficult. Now you have to register the factory, and register the
> > properties.
> >
> > 2. It is possible some policies use the same property name but with
> > different constraint. For example, for one policy, "num-components" may
> > allow any integer at least 1, while another policy only allows any
> integer
> > at least 2. Some property may use number type while in some policy the
> same
> > property may allow string.
> >
> > 3. In the current validation, there is no way to have optional property
> or
> > property with default value, which means certain properties can be
> omitted
> > when creating datasets.
> >
> > If policy property validation is done in the policy itself, it gives more
> > flexibility to design new policies and improve the current policies.
> >
>


Re: Firewall Blocked Local Host Site

2019-05-31 Thread Chen Luo
Microsoft has tutorial on this topic:
https://support.microsoft.com/en-us/help/4028544/windows-10-turn-windows-defender-firewall-on-or-off
(Allow
an app through the firewall)

The student needs to follow this step and allow AsterixDB (or
Java/NCService/CCService, depending on the name of the AsterixDB process)
throughput the firewall.

Best regards,
Chen Luo



On Fri, May 31, 2019 at 8:47 AM Michael Carey  wrote:

> This sounds vaguely familiar - thoughts?  (Anyone?) This is one of a
> couple of hundred temporary Windows customers of AsterixDB... :-)
>
>
>  Forwarded Message 
> Subject:Firewall Blocked Local Host Site
> Date:   Fri, 31 May 2019 15:42:03 + (UTC)
> From:   CS 122A on Piazza 
> Reply-To:   re...@piazza.com
> To: mjca...@ics.uci.edu
>
>
>
> *-- Reply directly to this email above this line to create a new follow
> up. Or Click here
> <
> https://piazza.com/class?cid=jwc99uq2w2a2zc=jtvr1nor3pg25g=9NThYXn2mS3>
>
> to view.--*
> A new Question was posted by Jeremy Anderson.
>
> *Firewall Blocked Local Host Site*
>
> One of the scripts prompted a windows firewall popup and I accidentally
> skipped over it. Now, when I try to use the local host website, it gets
> blocked for me. Does someone know what exactly I have to change on my
> firewall to make it work? If not, can someone post a picture of the
> popup if they come across it?
>
>
>
> Search or link to this question with @807. Follow it
> <https://piazza.com/follow/jwc99uq2w2a2zc/9NThYXn2mS3/1627759e> to get
> notified when a response comes in.
>
> Sign up for more classes at http://piazza.com/uci.
>
>
> Tell a colleague about Piazza. It's free, after all.
>
> Thanks,
> The Piazza Team
> --
> Contact us at t...@piazza.com
>
>
> You're receiving this email because mjca...@ics.uci.edu is enrolled in
> CS 122A at University of California, Irvine. Sign in
> <https://piazza.com/login> to manage your email preferences or un-enroll
> <https://piazza.com/remove/9NThYXn2mS3/jtvr1nor3pg25g> from this class.
> Email id: jtvr1nor3pg25g|jwc99uq2w2a2zc|9NThYXn2mS3
>


Recent change on removing statement as request body

2019-04-07 Thread Chen Luo
Hi devs,

I noticed there is a recent change on master that removes the undocumented
ability to use the request body as the statement [1]. This patch breaks
many of my experiment scripts and *many data preparation scripts used by
Cloudberry.* Also, I had a hard time to modify my scripts to use the
"statement" parameter for two difficulties:

   1. It seems that only the first statement is executed but the rest are
   simply ignored;
   2. Double quotes are always ignored by the parser.

Can this patch be reverted? If not, can we at least update our wiki [2][3]
to give examples about multiple queries and handling double quotes?

Best regards,
Chen Luo

[1] https://asterix-gerrit.ics.uci.edu/#/c/3267/
[2]
https://cwiki.apache.org/confluence/display/ASTERIXDB/New+HTTP+API+Design
[3] https://ci.apache.org/projects/asterixdb/api.html


Re: Important to Use a Separate Disk for Logging on SSDs

2019-02-21 Thread Chen Luo
I re-attached the image as follows. In case it still doesn't show up, the
average point lookup throughput of* SSD for LSM + Logging* is only around
*3-4k/s*. When a separate hard disk is used for logging, the average point
lookup throughput reaches *30k-40k/s*.

[image: image.png]

Best regards,
Chen Luo

On Thu, Feb 21, 2019 at 10:01 AM abdullah alamoudi 
wrote:

> Thanks for sharing Chen, very interesting.
>
> The image doesn't show up for me. Not sure if it shows up for others?
>
> Cheers,
> Abdullah.
>
> On Wed, Feb 20, 2019 at 1:29 PM Chen Luo  wrote:
>
> > Hi Devs,
> >
> > Recently I've been running experiments with concurrent ingestions and
> > queries on SSDs. I'd like to share an important lesson from my
> experiments.
> > In short,* it is very important (from the performance perspective) to use
> > a separate disk for logging, even SSDs are good at random I/Os*.
> >
> > The following experiment illustrates this point. I was using YCSB with
> > 100GB base data (100M records, each has 1KB). During each experiment,
> there
> > was a constant data arrival process of 3600 records/s. I executed
> > concurrent point lookups (uniformly distributed) as much as possible
> using
> > 16 query threads (to saturate the disk). The page size was set to 4KB.
> The
> > experiments were performed on SSDs. The only difference is that one
> > experiment had a separate hard disk for logging, while the other used the
> > same SSD for both LSM and logging. The point lookup throughput over time
> > was plotted below. The negative impact of logging is huge!
> >
> > [image: image.png]
> >
> > The reason is that logging needs to frequently force disk writes (in this
> > experiment, the log flusher forces 70-80 times per second). Even though
> the
> > disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
> > disk forces could seriously impact the overall disk throughput. If you
> have
> > a workload with concurrent data ingestion and queries, please DO consider
> > using a separate disk for logging to fully utilize the SSD bandwidth.
> >
> > Best regards,
> > Chen Luo
> >
>


Important to Use a Separate Disk for Logging on SSDs

2019-02-20 Thread Chen Luo
Hi Devs,

Recently I've been running experiments with concurrent ingestions and
queries on SSDs. I'd like to share an important lesson from my experiments.
In short,* it is very important (from the performance perspective) to use a
separate disk for logging, even SSDs are good at random I/Os*.

The following experiment illustrates this point. I was using YCSB with
100GB base data (100M records, each has 1KB). During each experiment, there
was a constant data arrival process of 3600 records/s. I executed
concurrent point lookups (uniformly distributed) as much as possible using
16 query threads (to saturate the disk). The page size was set to 4KB. The
experiments were performed on SSDs. The only difference is that one
experiment had a separate hard disk for logging, while the other used the
same SSD for both LSM and logging. The point lookup throughput over time
was plotted below. The negative impact of logging is huge!

[image: image.png]

The reason is that logging needs to frequently force disk writes (in this
experiment, the log flusher forces 70-80 times per second). Even though the
disk bandwidth used by the log flusher is small (4-5MB/s), the frequent
disk forces could seriously impact the overall disk throughput. If you have
a workload with concurrent data ingestion and queries, please DO consider
using a separate disk for logging to fully utilize the SSD bandwidth.

Best regards,
Chen Luo


Re: [VOTE] Release Apache AsterixDB 0.9.4 and Hyracks 0.3.4 (RC1)

2018-07-27 Thread Chen Luo
-1

Tested data ingestion with a tweet generator, however found the primary key
index (a.k.a. the secondary primary index) is not maintained properly by
upsert feeds. All primary keys are filtered out
by LSMSecondaryUpsertOperatorNodePushable, which results in the primary key
index is always empty.

Best regards,
Chen Luo

On Mon, Jul 23, 2018 at 4:17 PM Ian Maxon  wrote:

> Based on some comments from Till I have added files containing the
> sha1 sum of the archives into the staged release repository. I've also
> added the .deb binary package that is in the maven repository into the
> svn staging repository as well.
>
> On Thu, Jul 19, 2018 at 10:49 PM, Xikui Wang  wrote:
> > +1
> >
> > - SHA1 verified for NCService Installer.
> > - Drop-in Twitter4j jar and verified twitter feed.
> > - UDF installation and query work fine.
> >
> > Best,
> > Xikui
> >
> > On Thu, Jul 19, 2018 at 1:41 PM Ian Maxon  wrote:
> >
> >> Here's a summary of all issues addressed in some way for this release,
> >> since last release:
> >>
> >> [ASTERIXDB-2397][*DB] Enable build on Java 10
> >> [ASTERIXDB-2397][*DB] Fix sample cluster on Java 10
> >> [ASTERIXDB-2397][*DB] Enable execution on Java 9/10
> >> [ASTERIXDB-2396][LIC] Include netty-all required NOTICEs
> >> [ASTERIXDB-2318] Build dashboard in mvn
> >> [ASTERIXDB-2387][MTD] Prevent Dataset Primary Index Drop
> >> [ASTERIXDB-2377][OTH] Fix JSON of Additional Expressions
> >> [ASTERIXDB-2354][COMP] Partition constraint propagation for binary
> >> operators
> >> [ASTERIXDB-2355][SQL] Incorrect error reporting by SQL++ parser
> >> [ASTERIXDB-2358][LIC] Fix asterix-replication packaging
> >> [ASTERIXDB-2347][DOC] Update Configurable Parameters
> >> [ASTERIXDB-2361][HYR] Memory Leak Due to Netty Close Listeners
> >> [ASTERIXDB-2353][HYR][RT][FAIL] Provide complete thread dumps
> >> [ASTERIXDB-2352][FUN] Incorrect leap year handling in duration
> arithmetic
> >> [ASTERIXDB-2351][COMP] Allow '+' after exponent indicator in double
> >> literals
> >> [ASTERIXDB-2343][FUN] Implement to_array(), to_atomic(), to_object()
> >> [ASTERIXDB-2348][COMP] Incorrect result with distinct aggregate
> >> [ASTERIXDB-2346][COMP] Constant folding should not fail on runtime
> >> exceptions
> >> [ASTERIXDB-2345][FUN] Fix runtime output type for object_names()
> >> [ASTERIXDB-2340][FUN] Implement object_length(), object_names()
> >> [ASTERIXDB-2334] Fix Range Predicate for Composite Key Search
> >> [ASTERIXDB-2216] Disable flaky test which depends on external site
> >> [ASTERIXDB-1708][TX] Prevent log deletion during scan
> >> [ASTERIXDB-2321][STO] Follow the contract in IIndexCursor.open calls
> >> [ASTERIXDB-2332][RT] Fix concurrency issue with RecordMerge and
> >> RecordRemoveFields
> >> [ASTERIXDB-2308][STO] Prevent Race To Allocate Memory Components
> >> [ASTERIXDB-2319][TEST] Split Queries in start-feed Test
> >> [ASTERIXDB-2285][TEST] Increase Poll Time on Test
> >> [ASTERIXDB-2317] Intermittent Failure in Kill CC NCServiceExecutionIT
> >> [ASTERIXDB-2329][MTD] Remove Invalid Find Dataset
> >> [ASTERIXDB-2330][*DB][RT] Add IFunctionRegistrant for dynamic function
> >> registration
> >> [ASTERIXDB-2320][CLUS] Don't delay removing dead node on max heartbeat
> >> misses
> >> [ASTERIXDB-2316][STO] Fix Merging Components For Full Merge
> >> [ASTERIXDB-2213] Guard against concurrent config updates
> >> [ASTERIXDB-1952][TX][IDX] Filter logs pt.2
> >> [ASTERIXDB-1280][TEST] JUnit cleanup
> >> [ASTERIXDB-2305][FUN] replace() should not accept regular expressions
> >> [ASTERIXDB-2313][EXT] JSONDataParser support for non-object roots
> >> [ASTERIXDB-2229][OTR] Restore Thread Names in Thread Pool
> >> [ASTERIXDB-2304] Ensure Flush is Finished in FlushRecoveryTest
> >> [ASTERIXDB-2307][COMP] Incorrect result with quantified expression
> >> [ASTERIXDB-2303][API] Fix Supplementary Chars Printing
> >> [ASTERIXDB-2148][FUN] Add init parameter for external UDF
> >> [ASTERIXDB-2301][TX] Fix Abort of DELETE operation
> >> [ASTERIXDB-2074][MVN] Fix manifest metadata
> >> [ASTERIXDB-2302][COMP] Incorrect result with non-enforced index
> >> [ASTERIXDB-2299] Set log type properly during modifications
> >> [ASTERIXDB-2188] Ensure recovery of component ids
> >> [ASTERIXDB-2227][ING] Enabling filitering incoming data in feed
> >> [ASTERIXDB-2296][COMP] proper handling of an optional subfield type
> >> [

Re: AsterixDB merge compaction issue with rtree index

2018-07-15 Thread Chen Luo
Hi Mohiuddin,

This bug has already existed for a year or so (
https://issues.apache.org/jira/browse/ASTERIXDB-2125?filter=-2). It seems
during the RTree bulkloading process some calculation of MBRs are wrong
(but it's hard to be reproduced). If one merge operation fails, then in the
future all retries would fail as well since the same bug would always be
triggered...

Best regards,
Chen Luo

On Sun, Jul 15, 2018 at 2:22 AM Mohiuddin Qader  wrote:

> After I debugged myself and looked up in logs, I tracked down the problem.
> It looks like during merge compaction it fails due to data type issue for
> points during MBR calculation at the end of bulk loading (don't know why)
> and throws an exception. But the main bug here was that there is no follow
> up to clean up the failed merged file and thus no subsequent compaction can
> occur. I attach the screenshot.
>
>
> org.apache.hyracks.algebricks.common.exceptions.NotImplementedException:
> Value provider for type bigint is not implemented.
>
>
>
>
> On Sat, Jul 14, 2018 at 9:53 PM, Mohiuddin Qader  wrote:
>
>> Hello everyone,
>>
>> I have been running some experiments on current master AsterixDB with
>> default merge policy. I am using simple OpenStreetMap and Twitter dataset
>> with id used as primary key and a Rtree index on location attribute.
>>
>> Then I used a feed to ingest the data to the database. After running
>> some time (about 2 million insertion), the Rtree index
>> (LSMRtreeWithAntiMatterTuples) are throwing a strange exception like file
>> already exists: After this exception, cluster infinitely giving this
>> exception repetitively and become unusable.
>>
>> 20:36:55.980 [Executor-24:asterix_nc1] ERROR
>> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness - Failed merge
>> operation on {"class" : "LSMRTreeWithAntiMatterTuples", "dir" :
>> "/home/mohiuddin/asterix-hyracks/asterixdb/target/io/dir/asterix_nc1/target/tmp/asterix_nc1/iodevice1/storage/partition_0/experiments/OpenStreetMap/0/OSMlocation",
>> "memory" : 2, "disk" : 5}
>> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0082: Failed
>> to create the file
>> /home/mohiuddin/asterix-hyracks/asterixdb/target/io/dir/asterix_nc1/target/tmp/asterix_nc1/iodevice1/storage/partition_0/experiments/OpenStreetMap/0/OSMlocation/2018-07-14-20-36-09-733_2018-07-14-20-34-55-555
>> because it already exists
>> at
>> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
>> ~[classes/:?]
>> at org.apache.hyracks.api.util.IoUtil.create(IoUtil.java:87)
>> ~[classes/:?]
>> at
>> org.apache.hyracks.storage.common.buffercache.BufferCache.createFile(BufferCache.java:809)
>> ~[classes/:?]
>> at 
>> org.apache.hyracks.storage.am.common.impls.AbstractTreeIndex.create(AbstractTreeIndex.java:83)
>> ~[classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMDiskComponent.activate(AbstractLSMDiskComponent.java:158)
>> ~[classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.createDiskComponent(AbstractLSMIndex.java:427)
>> ~[classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.rtree.impls.LSMRTreeWithAntiMatterTuples.doMerge(LSMRTreeWithAntiMatterTuples.java:237)
>> ~[classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.merge(AbstractLSMIndex.java:728)
>> ~[classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.merge(LSMHarness.java:645)
>> [classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.merge(LSMTreeIndexAccessor.java:128)
>> [classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.MergeOperation.call(MergeOperation.java:45)
>> [classes/:?]
>> at 
>> org.apache.hyracks.storage.am.lsm.common.impls.MergeOperation.call(MergeOperation.java:30)
>> [classes/:?]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [?:1.8.0_45-internal]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [?:1.8.0_45-internal]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [?:1.8.0_45-internal]
>> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45-internal]
>>
>>
>> I have tried to reinstall everything multiple times, removed all old
>> storage files, giving pause during and before ingestion through feeds.
>> Nothing seems to be working, every time this exception is occurring after
>> some ingestion. Can any of you guys have idea on what is happening? I have
>> attached the DDL I was using.
>>
>>
>>
>> --
>> Regards,
>> Mohiuddin Abdul Qader
>> Dept of Computer Science
>> University of California Riverside
>>
>
>
>
> --
> Regards,
> Mohiuddin Abdul Qader
> Dept of Computer Science
> University of California Riverside
>


Re: Help required for AsterixDB Use case

2018-06-14 Thread Chen Luo
The documentation contains the guide to setup AsterixDB clusters (
https://ci.apache.org/projects/asterixdb/ncservice.html).

The normal way for an application to "talk with" AsterixDB is to use the
HTTP query API (https://ci.apache.org/projects/asterixdb/api.html). Also
you can checkout Cloudberry (http://cloudberry.ics.uci.edu/), which is a
data visualization project built on top of AsterixDB.

Best regards,
Chen Luo

On Thu, Jun 14, 2018 at 10:10 AM Khurram Faraaz 
wrote:

> You will find details here
> https://asterixdb.apache.org/docs/0.9.3/
>
> Thanks,
> Khurram
>
> On Thu, Jun 14, 2018, 8:43 AM Shobhit Chourasiya <
> shobhitchouras...@gmail.com> wrote:
>
> > Many Thanks for your response Mike!! :)
> >
> > In the links provided, I could not find anyone with steps for setting up
> an
> > application with AsterixDB. Does anyone have any clue if there is any
> > link/article available, which shows how to set up an application with
> > AsterixDB?
> >
> > Best Regards,
> > Shobhit Chourasiya
> >
> > On Wed, Jun 13, 2018 at 10:13 AM, Mike Carey  wrote:
> >
> > > (+ dev)
> > >
> > > Yes, for reasons we don't understand, HBase is a little more popular
> and
> > > better known than AsterixDB.  :-)
> > >
> > > More seriously, http://asterix.ics.uci.edu/publications.html has
> > most/all
> > > of the AsterixDB publications, which may help.  They are VERY different
> > > systems - AsterixDB is essentially what you get if you approach the
> > > document database problem from the standpoint of giving users
> > "everything"
> > > they would get in a parallel relational DBMS but with a relaxed (and
> you
> > > can tune how relaxed) data model - no 1NF requirement, no schema
> mandate,
> > > ordered as well as unordered options for multivalued fields, etc. -
> and a
> > > full query language (first AQL, nowadays SQL++) that can do everything
> > you
> > > can do in SQL92 and more (in order to accommodate the data model
> > changes).
> > >
> > > Cheers,
> > >
> > > Mike
> > >
> > > On 6/12/18 1:00 AM, Shobhit Chourasiya wrote:
> > >
> > > Hello Everyone,
> > >
> > > First of all, I would humbly say that I am a very new user of AsterixDB
> > > along with Bigdata and Hadoop. I have quite limited knowledge related
> to
> > > the system, so I apologize in advance if you have to be patient and may
> > > have to describe some lower level details(or links related to it) to
> > answer
> > > my queries.
> > >
> > > As a assignment in my University, I need to analyze functionality of
> > > AsterixDB and HBase and need to find differences in both. Although I
> went
> > > through some articles related to installation and initial queries on
> > > AsterixDB page, I am still finding difficulty in understanding a
> complex
> > > scenario for using AsterixDB and how to show case it, because the
> > > information on web is quite limited for AsterixDB, though there's
> > abundant
> > > information present for HBase.
> > >
> > > If anyone can help me or provide me some leads related to it, I would
> be
> > > very grateful. Please help
> > >
> > > Many Thanks!
> > >
> > > Best Regards,
> > > Shobhit Chourasiya
> > >
> > >
> > >
> >
>


Re: About null values in secondary index

2018-05-27 Thread Chen Luo
Got it, thanks! Just submitted a patch to fix it. Also, the same problem
exists when we upsert into a secondary index. Only missing values are
filtered out, but not null values.

Best regards,
Chen Luo

On Sun, May 27, 2018 at 1:27 PM, Mike Carey <dtab...@gmail.com> wrote:

> Indeed.
>
> On Sun, May 27, 2018, 11:40 AM abdullah alamoudi <bamou...@gmail.com>
> wrote:
>
> > I think that this would be a bug if that was the case. I am pretty sure
> at
> > some point, we had a filter for nulls.
> >
> > Cheers,
> > Abdullah.
> >
> > > On May 26, 2018, at 5:25 AM, Chen Luo <cl...@uci.edu> wrote:
> > >
> > > Hi devs,
> > >
> > > I've a question about null values in secondary index. I've assumed that
> > > null values are simply excluded from the secondary index. For example,
> > if I
> > > build a secondary index on Name, then a record with Name being null
> would
> > > not be indexed. I also saw in IntroduceSecondaryIndexInsertDeleteRule
> we
> > > created some filtering expression to filter out null secondary values.
> > >
> > > However, when we bulkload a dataset, this filtering expression is never
> > > executed when loading a secondary index, which implies all null values
> > are
> > > added to the secondary index. Is it safe to assume this is a bug?
> > >
> > > Best regards,
> > > Chen Luo
> >
> >
>


About null values in secondary index

2018-05-25 Thread Chen Luo
Hi devs,

I've a question about null values in secondary index. I've assumed that
null values are simply excluded from the secondary index. For example, if I
build a secondary index on Name, then a record with Name being null would
not be indexed. I also saw in IntroduceSecondaryIndexInsertDeleteRule we
created some filtering expression to filter out null secondary values.

However, when we bulkload a dataset, this filtering expression is never
executed when loading a secondary index, which implies all null values are
added to the secondary index. Is it safe to assume this is a bug?

Best regards,
Chen Luo


Instant Locking during Primary Scans

2018-05-08 Thread Chen Luo
Hi Devs,

I recently noticed there is some performance degradation of primary index
scans after syncing with current master (15% - 20% with record size
~300bytes each). It seems this is caused by a recently patch which requires
an instant S lock for every record in all components [1]. Acquiring a lock
on for each record is relatively expensive, and it becomes even worse for
smaller records. And I believe previously we've omitted locking for
scanning disk components intentionally, because records in disk components
must have already been committed and cannot change (I assume we're
implementing something similar to "Read Committed" semantics?)

I propose that this patch needs to be somehow reverted to avoid locking on
a record-basis. As for index-only patch, one can safely proceed if a tuple
is found from the disk component, again because it has already been
committed. Thoughts?

Best regards,
Chen Luo


[1] https://asterix-gerrit.ics.uci.edu/#/c/2623/


Re: Restart Cluster during Execution Tests

2018-05-02 Thread Chen Luo
Hi Steven,

I've written a similar test case previously to test the recovery of LSM
components.
https://asterix-gerrit.ics.uci.edu/#/c/2408/7/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/dataflow/LSMFlushRecoveryTest.java

Basically we can use nc controller to shutdown/start the node manually.

Best regards,
Chen Luo

On Wed, May 2, 2018 at 1:04 PM, Steven Jacobs <sjaco...@ucr.edu> wrote:

> Killing nodes starts the global recovery process anyway right? That's all
> that I actually need. How is this done?
> Steven
>
> On Wed, May 2, 2018 at 1:02 PM, Ian Maxon <ima...@uci.edu> wrote:
>
> > We kill and bring back nodes but not the entire cluster, to my knowledge.
> >
> > On Wed, May 2, 2018 at 12:48 PM, Steven Jacobs <sjaco...@ucr.edu> wrote:
> > > Hi all,
> > > Do we have any method in place to restart the cluster during an
> execution
> > > test?
> > > Steven
> >
>


Re: Is there an easier way to wrap/unwrap the entire tuple as a ByteBuffer?

2018-04-10 Thread Chen Luo
Hi,

You can try IFrameFieldAppender (and its implementation
FrameFixedFieldAppender) to directly append wrapped tuple (field by field)
to the output buffer, without going through the array tuple builder. But in
general, because of the tuple format, I'm not sure there is a more
efficient way to wrap/unwrap tuples directly.

Best regards,
Chen Luo

On Tue, Apr 10, 2018 at 10:33 AM, Muhammad Abu Bakar Siddique <
msidd...@ucr.edu> wrote:

> Hi Dev,
> I'm working on a Hyracks application for parallel random sampling which
> consists of two operators. The first operator generates and appends a new
> field to each tuple while the second operator processes that additional
> field and removes it before writing the final output. So, the output of the
> second operator should have the same format of the input of the first
> operator. In other words, I want the first operator to wrap the tuple as-is
> and add an additional field while the second operator should remove and
> unwrap the tuple. Currently, I use the FrameTupleAppender and
> ArrayTupleAppender where I have to add each field in the input record
> separately but it seems to be an overhead in the code. Is there an easier
> way to wrap/unwrap the entire tuple as a ByteBuffer without having to worry
> about the individual fields inside it?
>


Re: Great LSM Ingestion news!

2018-04-04 Thread Chen Luo
Thanks Steven! BTW, as your dataset gets larger, each insert might become
more and more expensive. This patch
https://asterix-gerrit.ics.uci.edu/#/c/2453/ can help once you encounter
that problem...

On Wed, Apr 4, 2018 at 11:27 AM, Mike Carey  wrote:

> Nice!!!
>
>
>
> On 4/4/18 10:32 AM, Steven Jacobs wrote:
>
>> Just wanted to share for anyone who wanted positive news today. Two of
>> Chen's recent changes:
>> https://asterix-gerrit.ics.uci.edu/#/c/2560/
>> https://asterix-gerrit.ics.uci.edu/#/c/2553/
>>
>> will have a huge impact on write performance. I have a query that ends up
>> writing about 40,000 records, and the time was reduced from 5 seconds down
>> to less than a second just by incorporating these changes!
>>
>> Thank you Chen! May the odds be ever in your favor!
>>
>> Steven
>>
>>
>


Re: "Erroneous" Sync Write Mode in ResultState

2018-04-02 Thread Chen Luo
This is introduced by a recent fix [1]. It seems forcing writes when
ResultState is being closed should give same semantics (but with much
higher throughput)?


[1] https://asterix-gerrit.ics.uci.edu/#/c/2319/

On Mon, Apr 2, 2018 at 10:27 AM, Chen Luo <cl...@uci.edu> wrote:

> Hi Devs,
>
> As I saw last week, using the "sync" write mode to write data (i.e., in
> Transaction Log Flusher) can lead to bad write throughput if the write
> buffer is small. However, I just saw "sync" write mode is in ResultState
> when we store the query result [1]. *Is this "sync" mode really needed on
> a frame-basis*? The frame size is typically 32KB, and this could lead to
> low throughput if a user wants to pull a lot of data out of our system. If
> there is no specific need for this "sync" write, I'll submit a change to
> fix it. Thanks!
>
> [1] https://github.com/apache/asterixdb/blob/
> 77c8c79077ec33fae5944d982a337b784608aa87/hyracks-fullstack/
> hyracks/hyracks-control/hyracks-control-nc/src/main/
> java/org/apache/hyracks/control/nc/dataset/ResultState.java#L130
>
> Best regards,
> Chen Luo
>


"Erroneous" Sync Write Mode in ResultState

2018-04-02 Thread Chen Luo
Hi Devs,

As I saw last week, using the "sync" write mode to write data (i.e., in
Transaction Log Flusher) can lead to bad write throughput if the write
buffer is small. However, I just saw "sync" write mode is in ResultState
when we store the query result [1]. *Is this "sync" mode really needed on a
frame-basis*? The frame size is typically 32KB, and this could lead to low
throughput if a user wants to pull a lot of data out of our system. If
there is no specific need for this "sync" write, I'll submit a change to
fix it. Thanks!

[1]
https://github.com/apache/asterixdb/blob/77c8c79077ec33fae5944d982a337b784608aa87/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/dataset/ResultState.java#L130

Best regards,
Chen Luo


Re: Current Default Log Buffer Size is Too Small

2018-03-30 Thread Chen Luo
I'll make a table instead...

Page Size Write Throughput (MB/s)
64KB 1.61
128KB 3.14
256KB 5.69
512KB 9.52
1024KB 17.41
2048KB 28.29
4096KB 41.63
8192KB 56.26
16384KB 72.11


On Fri, Mar 30, 2018 at 3:25 PM, abdullah alamoudi <bamou...@gmail.com>
wrote:

> Am I the only one who didn't get the image in the email?
>
> > On Mar 30, 2018, at 3:22 PM, Chen Luo <cl...@uci.edu> wrote:
> >
> > An update on this issue. It seems this speed-up comes from simply
> increasing the log page size (and I've submitted a patch
> https://asterix-gerrit.ics.uci.edu/#/c/2553/ <https://asterix-gerrit.ics.
> uci.edu/#/c/2553/>).
> >
> > I also wrote a simple program to test the write throughput w.r.t.
> different page sizes:
> >for (int i = 0; i < numPages; i++) {
> > byteBuffer.rewind();
> > while (byteBuffer.hasRemaining()) {
> > totalBytesWriten += channel.write(byteBuffer);
> > }
> > channel.force(false);
> > }
> > }
> > It also confirms that varying page size can have a big impact on the
> disk throughput (even it's sequential I/Os). The experiment result on one
> of our sensorium node is as follows:
> >
> >
> >
> >
> > On Tue, Mar 27, 2018 at 5:19 PM, Chen Luo <cl...@uci.edu  cl...@uci.edu>> wrote:
> > Hi Devs,
> >
> > Recently I was doing ingestion experiments, and found out our default
> log buffer size (1MB = 8 pages * 128KB page size) is too small, and
> negatively impacts the ingestion performance. The short conclusion is that
> by simply increasing the log buffer size (e.g., to 32MB), I can improve the
> ingestion performance by 50% ~ 100% on a single node sensorium machine as
> shown follows.
> >
> >
> > The detailed explanation of log buffer size is as follows. Right now we
> have a background LogFlusher thread which continuously forces log records
> to disk. When the log buffer is full, writers are blocked to wait for log
> buffer space. However, when setting the log buffer size, we have to
> consider the LSM operations as well. The memory component is first filled
> up with incoming records at a very high speed, which is then flushed to
> disk at a relatively low speed. If the log buffer size is small, ingestion
> is very likely to be blocked by the LogFlusher when filling up the memory
> component. This blocking is wasted since quite often flush/merge is idle.
> However, when the log buffer is relatively large, the LogFlush can catch up
> itself when ingestion is blocked by flush/merge, which is not harmful since
> there is ongoing LSM I/O operations.
> >
> > I didn't know how large the log buffer size should be right now (as it
> depends on various factors), but our default value 1MB is very likely too
> small to cause blocking during normal ingestion time. Just let you know and
> be aware of this parameter when you measure ingestion performance...
> >
> > Best regards,
> > Chen Luo
> >
> >
>
>


Re: Current Default Log Buffer Size is Too Small

2018-03-30 Thread Chen Luo
An update on this issue. It seems this speed-up comes from simply
increasing the log page size (and I've submitted a patch
https://asterix-gerrit.ics.uci.edu/#/c/2553/).

I also wrote a simple program to test the write throughput w.r.t. different
page sizes:

   for (int i = 0; i < numPages; i++) {

byteBuffer.rewind();

while (byteBuffer.hasRemaining()) {

totalBytesWriten += channel.write(byteBuffer);

}

channel.force(false);

}

}

It also confirms that varying page size can have a big impact on the disk
throughput (even it's sequential I/Os). The experiment result on one of our
sensorium node is as follows:





On Tue, Mar 27, 2018 at 5:19 PM, Chen Luo <cl...@uci.edu> wrote:

> Hi Devs,
>
> Recently I was doing ingestion experiments, and found out our default log
> buffer size (1MB = 8 pages * 128KB page size) is too small, and negatively
> impacts the ingestion performance. The short conclusion is that by simply
> increasing the log buffer size (e.g., to 32MB), I can improve the ingestion
> performance by *50% ~ 100%* on a single node sensorium machine as shown
> follows.
>
>
> The detailed explanation of log buffer size is as follows. Right now we
> have a background LogFlusher thread which continuously forces log records
> to disk. When the log buffer is full, writers are blocked to wait for log
> buffer space. However, when setting the log buffer size, we have to
> consider the LSM operations as well. The memory component is first filled
> up with incoming records at a very high speed, which is then flushed to
> disk at a relatively low speed. If the log buffer size is small, ingestion
> is very likely to be blocked by the LogFlusher when filling up the memory
> component. This blocking is wasted since quite often flush/merge is idle.
> However, when the log buffer is relatively large, the LogFlush can catch up
> itself when ingestion is blocked by flush/merge, which is not harmful since
> there is ongoing LSM I/O operations.
>
> I didn't know how large the log buffer size should be right now (as it
> depends on various factors), but our default value *1MB* is very likely
> too small to cause blocking during normal ingestion time. Just let you know
> and be aware of this parameter when you measure ingestion performance...
>
> Best regards,
> Chen Luo
>
>


Current Default Log Buffer Size is Too Small

2018-03-27 Thread Chen Luo
Hi Devs,

Recently I was doing ingestion experiments, and found out our default log
buffer size (1MB = 8 pages * 128KB page size) is too small, and negatively
impacts the ingestion performance. The short conclusion is that by simply
increasing the log buffer size (e.g., to 32MB), I can improve the ingestion
performance by *50% ~ 100%* on a single node sensorium machine as shown
follows.


The detailed explanation of log buffer size is as follows. Right now we
have a background LogFlusher thread which continuously forces log records
to disk. When the log buffer is full, writers are blocked to wait for log
buffer space. However, when setting the log buffer size, we have to
consider the LSM operations as well. The memory component is first filled
up with incoming records at a very high speed, which is then flushed to
disk at a relatively low speed. If the log buffer size is small, ingestion
is very likely to be blocked by the LogFlusher when filling up the memory
component. This blocking is wasted since quite often flush/merge is idle.
However, when the log buffer is relatively large, the LogFlush can catch up
itself when ingestion is blocked by flush/merge, which is not harmful since
there is ongoing LSM I/O operations.

I didn't know how large the log buffer size should be right now (as it
depends on various factors), but our default value *1MB* is very likely too
small to cause blocking during normal ingestion time. Just let you know and
be aware of this parameter when you measure ingestion performance...

Best regards,
Chen Luo


Re: Question About Dataset Granule Locking

2018-02-25 Thread Chen Luo
It seems the link is deleted by the email...Here is the link for
FlushDatasetOperationDescriptor:

https://github.com/apache/asterixdb/blob/5070d633eaee536c20706e59891a44a6257d8bd8/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/std/FlushDatasetOperatorDescriptor.java#L82



On Sun, Feb 25, 2018 at 10:04 AM, Chen Luo <cl...@uci.edu> wrote:

> Hi Devs,
>
> I saw a few places where we've used -1 as the entity hash value during
> locking, and there is one comment saying that "lock the dataset granule" in
> FlushDatasetOperatorDescriptor [1]. However, after checking the source code
> of ConcurrentLockManager, I didn't see any places that actually support
> dataset granule locking. I also wrote a test case to perform a dataset S
> lock and then ingest data, which fails because data ingestions can still go
> through. I was wondering do we actually support dataset granule locking
> right now?
>
>
>
>
> [1]
>
> https://github.com/apache/asterixdb/blob/5070d633eaee536c20706e59891a44a6257d8bd8/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/std/FlushDatasetOperatorDescriptor.java#L82
>
>


Question About Dataset Granule Locking

2018-02-25 Thread Chen Luo
Hi Devs,

I saw a few places where we've used -1 as the entity hash value during
locking, and there is one comment saying that "lock the dataset granule" in
FlushDatasetOperatorDescriptor [1]. However, after checking the source code
of ConcurrentLockManager, I didn't see any places that actually support
dataset granule locking. I also wrote a test case to perform a dataset S
lock and then ingest data, which fails because data ingestions can still go
through. I was wondering do we actually support dataset granule locking
right now?




[1]

https://github.com/apache/asterixdb/blob/5070d633eaee536c20706e59891a44a6257d8bd8/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/std/FlushDatasetOperatorDescriptor.java#L82


Re: Move of AsterixHyracksIntegrationUtil from production to test

2018-02-07 Thread Chen Luo
To run AsterixHyracksIntegrationUtil from maven, you may try to add the
parameter "-Dexec.classpathScope=test".

Best regards,
Chen Luo

On Wed, Feb 7, 2018 at 12:33 PM, Ian Maxon <ima...@uci.edu> wrote:

> It always belonged in test so it just got moved there. What you're
> seeing in IntelliJ at least is probably just a stale run profile, try
> going to the class itself and context-clicking in the window and
> running it from there, that should create a run profile with the right
> values.
>
> On Wed, Feb 7, 2018 at 11:07 AM, Ahmed Eldawy <aseld...@gmail.com> wrote:
> > Hi,
> >
> > I see that recently the AsterixHyracksIntegrationUtil was moved from
> > production to test in the following commit.
> >
> >>   [NO ISSUE] Move AsterixHyracksIntegrationUtil from production to test
> >>
> >>   Change-Id: Id603d0f1ac17b977356e628a89845d240c8aa8b7
> >>   Reviewed-on: https://asterix-gerrit.ics.uci.edu/2311
> >>   Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
> >>   Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
> >>   Contrib: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
> >>   Reviewed-by: Till Westmann <ti...@apache.org>
> >
> >
> > I couldn't find a discussion around this issue and I don't know why this
> > move was necessary. I used to run the *AsterixHyracksIntegrationUtil*
> for
> > testing and trying out AsterixDB and now I cannot. Running the following
> > command gives the error below.
> >
> >> $ASTERIXDB/asterixdb/asterix-app$ *mvn exec:java*
> >> -Dexec.mainClass="org.apache.asterix.api.common.
> AsterixHyracksIntegrationUtil"
> >
> >
> >
> >> java.lang.ClassNotFoundException:
> >> org.apache.asterix.api.common.AsterixHyracksIntegrationUtil
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(
> ExecJavaMojo.java:270)
> >> at java.lang.Thread.run(Thread.java:745)
> >
> >
> > Although this is a less preferred way to me, I tried to run it from
> *IntelliJ
> > IDEA* but I got the following error.
> >
> >> java.lang.IllegalStateException: java.lang.ClassNotFoundException:
> >> org.apache.asterix.runtime.evaluators.functions.records.
> FieldAccessByIndexDescriptor$_Gen
> >> at
> >> org.apache.asterix.runtime.functions.FunctionCollection.
> getGeneratedFunctionDescriptorFactory(FunctionCollection.java:750)
> >> at
> >> org.apache.asterix.runtime.functions.FunctionCollection.addGenerated(
> FunctionCollection.java:348)
> >> at
> >> org.apache.asterix.runtime.functions.FunctionCollection.
> createDefaultFunctionCollection(FunctionCollection.java:473)
> >> at
> >> org.apache.asterix.app.cc.CCExtensionManager.(
> CCExtensionManager.java:101)
> >> at
> >> org.apache.asterix.hyracks.bootstrap.CCApplication.start(
> CCApplication.java:151)
> >> at
> >> org.apache.hyracks.control.cc.ClusterControllerService.
> startApplication(ClusterControllerService.java:236)
> >> at
> >> org.apache.hyracks.control.cc.ClusterControllerService.start(
> ClusterControllerService.java:222)
> >> at
> >> org.apache.asterix.api.common.AsterixHyracksIntegrationUtil.init(
> AsterixHyracksIntegrationUtil.java:144)
> >> at
> >> org.apache.asterix.api.common.AsterixHyracksIntegrationUtil.init(
> AsterixHyracksIntegrationUtil.java:177)
> >> at
> >> org.apache.asterix.api.common.AsterixHyracksIntegrationUtil.run(
> AsterixHyracksIntegrationUtil.java:333)
> >> at
> >> org.apache.asterix.api.common.AsterixHyracksIntegrationUtil.main(
> AsterixHyracksIntegrationUtil.java:102)
> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> at
> >> sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> >> at
> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> >> at java.lang.reflect.Method.invoke(Method.java:498)
> >> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> >> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.asterix.runtime.evaluators.functions.records.
> FieldAccessByIndexDescriptor$_Gen
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >> at
> >> org.apache.asterix.runtime.functions.FunctionCollection.
> getGeneratedFunctionDescriptorFactory(FunctionCollection.java:746)
> >> ... 15 more
> >
> >
> >
> > is there an alternative easy way to run *AsterixHyracksIntegrationUtil*
> or
> > a similar program?
> >
> > --
> > Best regards,
> > Ahmed Eldawy
>


Re: AsterixDB Performance Tuning

2018-01-29 Thread Chen Luo
Just make sure this email has been heard properly... I don't have too much
expertise on nested field access, but I think this is a fair comparison
excluding the effects of storage layer (both datasets are cached).
Moreover, it's also highly unlikely that MongoDB cheats by building some
key index, as Rana is not accessing the primary key here...

Best regards,
Chen Luo

On Sat, Jan 27, 2018 at 9:15 PM, Rana Alotaibi <ralot...@eng.ucsd.edu>
wrote:

> Hi all,
>
> I have a follow-up issue using the same query. Let's forget about the join
> and selection predicates. I have the following query (same as previous
> one), but without the join and selection predicates:
>
> *AsterixDB Query: *
> USE mimiciii;
> SET `compiler.parallelism` "5";
> SET `compiler.sortmemory` "128MB";
> SET `compiler.joinmemory` "265MB";
> SELECT COUNT(*) AS cnt
> FROM   PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>
> Result : {cnt:22237108}
>
> Please note I have changed the page-size default configuration from 128 KB
> to 1MB, and I have the buffercache stays that same ( 57GB)
>
> *MongoDB Equivalent Query:*
> db.patients.aggregate(
>[
>{
>"$unwind":"$ADMISSIONS",
>
>},
>{
>"$unwind":"$ADMISSIONS.LABEVENTS",
>
>},
>
>{ $count: "cnt }
>
>  ]
>   )
>
> Result : {cnt:22237108}
>
> The query takes ~7min in AsterixDB, and 30sec in MongoDB given that
> MongoDB is running on a single core. I don't think that MongoDB storage
> compression techniques play some factor here (All the data is cached in
> memory) unless MongoDB does some in-memory compression (Which I need to
> investigate more about that).
>
> Does this explain that navigating deeply into nested fields is an
> expensive operation in AsterixDB? or I do still have some issues with
> AsterixDB configuration parameters?.
>
> Thanks
> Rana
>
>
> On Sat, Jan 27, 2018 at 3:45 PM, Rana Alotaibi <ralot...@eng.ucsd.edu>
> wrote:
>
>> Hi Mike,
>>
>> Here is some results:
>> 1) Non-reordered FROM clause+ no bcast added ~12mins
>> 2) Reordered FROM clause + no bcast added ~12mins (same as (1) )
>> 3) Non-reordered FROM clause+ bcast added ~6mins
>> 4) Reordered FROM clause+bacst added ~6mins
>>
>> It seems the FROM clause datasets order has no impact. But in both cases,
>> the bcast reduced the execution time.
>>
>> As for querying MongoDB, I'm almost writing a "logical" plan for that
>> query (It took me days to understand MongoDB query operators). I totally
>> prefer SQL++ using hints. However, think about data scientists who are
>> mostly familiar with SQL queries, I don't expect them to spend time and
>> determine for example the predicates selectivity and accordingly decide
>> what's the appropriate join algorithms to use and specify this in their
>> query (i.e /*indexnl*/) (Basically they end-up doing the cost-based
>> optimizer job :) ).
>>
>> Thanks,
>> --Rana
>>
>> On Fri, Jan 26, 2018 at 2:15 PM, Mike Carey <dtab...@gmail.com> wrote:
>>
>>> Rana,
>>>
>>> We need the physical hints because we have a conservative cost-minded
>>> rule set rather than an actual cost-based optimizer - so it always picks
>>> partitioned hash joins when doing joins.  (I am curious as to how much the
>>> bcast hint helps vs. the reordered from clause - what fraction does each
>>> contribute to the win? - it would be cool to have the numbers without and
>>> with that hint if you felt like trying that - but don't feel obligated).
>>>
>>> Question: In MongoDB, didn't you end up essentially writing a
>>> query-plan-like program to solve this query - and isn't the SQL++ with
>>> hints a lot smaller/simpler?  (Just asking - I'm curious as to your
>>> feedback on that.)  We'd argue that you can write a mostly declarative
>>> familiar query and then mess with it and annotate it a little to tune it -
>>> which isn't as good as a great cost-based optimizer, but is better than
>>> writing/maintaining a program.  Thoughts?
>>>
>>> In terms of how good we can get - the size answer is telling.  In past
>>> days, when we were normally either on par with (smaller things) or beating
>>> (larger things) MongoDB, they hadn't yet acquired their new storage engine
>>> company (WiredTiger) with its compression.  Now they're running 3x+
>>> smaller, I would not be surprised if that's now the explanation for the
>>> remaining difference, which 

Re: Primary key lookup plan

2017-12-03 Thread Chen Luo
I don't think it's the case...I tried on my local env, and it's using a
primary index lookup instead of scan. Can you make sure the spelling of the
primary key is correct?

On Sun, Dec 3, 2017 at 3:49 PM, Wail Alkowaileet  wrote:

> Hi Devs,
>
> *For the given query:*
>
> SELECT VALUE t.text
> FROM ITweets as t
> WHERE t.tid = 100
>
> *The optimized plan:*
>
> distribute result [$$6]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> project ([$$6])
> -- STREAM_PROJECT  |PARTITIONED|
>   assign [$$6] <- [$$t.getField("text")]
>   -- ASSIGN  |PARTITIONED|
> project ([$$t])
> -- STREAM_PROJECT  |PARTITIONED|
>   select (eq($$7, 100))
>   -- STREAM_SELECT  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$7, $$t] <- FlatDataverse.ITweets
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   empty-tuple-source
>   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>
> Do we always do a scan and then filter the result, even though the query
> predicate is on the primary key?
> --
>
> *Regards,*
> Wail Alkowaileet
>


Re: The IIndexCursor interface

2017-11-30 Thread Chen Luo
A typo in previous email "internal objects should only be created during
the call of open" -> "internal objects should only be created during the
first call of open"

On Thu, Nov 30, 2017 at 4:37 PM, Chen Luo <cl...@uci.edu> wrote:

> +1.
>
> It seems to me the main issue for cursor is that a cursor sometime needs
> to be re-used for performance reason (e.g., during primary key lookups
> after secondary index search). One thing to note is that when make these
> changes, or implement new cursors, one has to be very careful that a cursor
> might be reused. As a requirement, internal objects should only be created
> during the call of open, and close method must not clean up these objects
> (unfortunately, the previous implementation of LSMBTreePointSearchCursor[1]
> mistakenly clears its internal objects during reset, which results in tons
> of objects are created during primary key lookups.)
>
> [1] https://github.com/apache/asterixdb/blob/
> 89e6a93277205a9dbc76c18e249919a745d224d2/hyracks-fullstack/
> hyracks/hyracks-storage-am-lsm-btree/src/main/java/org/
> apache/hyracks/storage/am/lsm/btree/impls/LSMBTreePointSearchCursor.
> java#L142
>
> On Thu, Nov 30, 2017 at 3:42 PM, abdullah alamoudi <bamou...@gmail.com>
> wrote:
>
>> Dear devs,
>> The IIndexCursor interface is one of the critical interfaces inside
>> asteridxb. It is used to access tuples inside indexes, we have many
>> implementations for it and it is used differently in a different places. We
>> are trying to specify a contract for the interface that all
>> implementors/users of the a cursor have to follow to ensure consistent
>> state and no leaked resources under any circumstances. The scope of this
>> email focuses on the lifecycle of cursors and on the following existing
>> methods:
>>
>> -- void open(ICursorInitialState initialState, ISearchPredicate
>> searchPred) throws HyracksDataException;
>> -- boolean hasNext() throws HyracksDataException;
>> -- void next() throws HyracksDataException;
>> -- void close() throws HyracksDataException;
>> -- void reset() throws HyracksDataException;
>>
>> Currently, these calls are "mostly" used as follows in our code:
>>
>> - If there are multiple search predicates:
>> cursor = new cursor();
>> while (more predicates){
>>   cursor.reset()
>>   cursor.open(predicate);
>>   while (cursor.hasNext()){
>> cursor.next()
>>   }
>> }
>> cursor.close();
>>
>> - If there is a single search predicate:
>> cursor = new cursor();
>> cursor.open(predicate);
>> while (cursor.hasNext()){
>>   cursor.next()
>> }
>> cursor.close();
>>
>> There are two problems with this:
>>
>> 1. There is no enforcement of any type of contract. For example, one can
>> open a cursor and reset it and then continue to read tuples from the cursor
>> as follows:
>>
>> cursor.open(predicate);
>> cursor.hasNext()
>> cursor.next()
>> cursor.reset()
>> cursor.hasNext()
>> cursor.next()
>>
>> and continue to read tuples. This is bug prone and can cause hidden bugs
>> to linger for a long time.
>>
>> 2. Naming and symmetry: open calls don't have corresponding close calls
>> "unless we know the cursor will be used with exactly one search predicate"
>> With this, the implementation of the cursor lead to either duplicate code
>> or having close() call reset() or the other way around and handling of
>> special cases.
>> Moreover, when there are slight differences, often it is easy to make a
>> change in one and forget about the other.
>>
>> ==
>> To deal with these issues, we are proposing the following:
>>
>> 1. change the methods to:
>>
>> -- void open(ICursorInitialState initialState, ISearchPredicate
>> searchPred) throws HyracksDataException;
>> -- boolean hasNext() throws HyracksDataException;
>> -- void next() throws HyracksDataException;
>> -- void close(); // used to be reset()
>> -- void destroy(); // used to be close()
>>
>>
>> The call cycle becomes:
>> - If there are multiple search predicates:
>> cursor = new cursor();
>> while (more predicates){
>>   cursor.open(predicate);
>>   while (cursor.hasNext()){
>> cursor.next()
>>   }
>>   cursor.close(); // used to be reset()
>> }
>> cursor.destroy(); // used to be close()
>>
>> - If there is a single search predicate:
>> cursor = new cursor();
>> cursor.open(predicate);
>> while (cur

Re: About the system behavior when the checkpoint is corrupted

2017-11-29 Thread Chen Luo
I'm not sure how the checkpoint file was corrupted. For my experiments, I
have some versions of AsterixDB sharing the same storage dir (so that I can
evaluate the performance after making some changes). Recently I synced my
branch with master, and maybe this causes some problem with the checkpoint
file (e.g., different versions of codebase?).

However, I think cleaning up the entire data directory is dangerous. The
user (such as me) can backup the checkpoint file because it's small, but it
would be cumbersome to backup the entire data directory. When there indeed
is something wrong with the checkpoint file, it's better that the user can
be aware of this, and make decisions by himself.

Best regards,
Chen Luo

On Wed, Nov 29, 2017 at 2:11 PM, abdullah alamoudi <bamou...@gmail.com>
wrote:

> I wonder how it got to that state.
>
> The first thing an instance does after initialization is create the
> snapshot file.
> This will only be deleted after a new (uncorrupted) snapshot file is
> created.
>
> I understand your point, but I wonder how it got to this state. Bug!?
>
> Cheers,
> Abdullah.
>
> > On Nov 29, 2017, at 1:54 PM, Chen Luo <cl...@uci.edu> wrote:
> >
> > Hi devs,
> >
> > Recently I was experiencing a very annoying issue about recovery. The
> > checkpoint file of my dataset was somehow corrupted (and I didn't know
> > why). However, when I was restarting AsterixDB, it fails to read the
> > checkpoint file, and starts recovering as a clean state. This is highly
> > undesirable in the sense that it clean up all of my experiment datasets
> > saliently, roughly 100GB. And it'll take me days to re-ingest these data
> to
> > resume my experiments.
> >
> > I think the behavior of cleaning up all data when some small thing goes
> > wrong is undesirable and dangerous. When AsterixDB fails to restart, and
> > finds the data directory non-empty, I think it should notify the user and
> > let the user to make the decision. For example, it could fail to restart
> at
> > this time, and user could clean up the directory manually, or try to use
> a
> > backup checkpoint file, or add some flag to force restart. Anyway,
> blindly
> > cleaning up all files seem to be a dangerous solution.
> >
> > Any thoughts on this?
> >
> > Best regards,
> > Chen Luo
>
>


Re: Adapting TimSort into AsterixDB/Hyracks

2017-10-28 Thread Chen Luo
That can be done in Hyracks layer, but enabling Quicksort in AsterixDB has
other issues. Right now AsterixDB only uses merge sort because of its
stableness. Once switching to Quicksort, we have to make sure all the other
operators are happy with this change (some operators are OK with it, but
some are not).

On Sat, Oct 28, 2017 at 12:05 PM, Mike Carey <dtab...@gmail.com> wrote:

> So the word on the web seems maybe to be that Quicksort is generally
> superior and cache-friendly (cache oblivious). Wondering if we should just
> get our Quicksort code under control?
>
>
>
> On 10/28/17 11:36 AM, Chen Luo wrote:
>
>> Not exactly sure about this right now... But since TimSort is essentially
>> a
>> combination of insertion sort and merge sort, it's cache-friendliness
>> won't
>> be worse than our merge sort.
>>
>> This TimSort could be served as a short-term plugin-and-play improvement
>> of
>> our sorting algorithm. It is still stable, the same as our current merge
>> sort, but faster, especially on partially ordered dataset.
>>
>> Best regards,
>> Chen Luo
>>
>> On Sat, Oct 28, 2017 at 10:58 AM, Mike Carey <dtab...@gmail.com> wrote:
>>
>> How is it on cache-friendliness?
>>>
>>>
>>>
>>> On 10/27/17 11:38 PM, abdullah alamoudi wrote:
>>>
>>> While I have no answer to the question of legality, this sounds great.
>>>>
>>>> ~Abdullah.
>>>>
>>>> On Oct 27, 2017, at 9:20 PM, Chen Luo <cl...@uci.edu> wrote:
>>>>
>>>>> Hi devs,
>>>>>
>>>>> I have adapted the TimSort algorithm used in JDK (java.util.TimSort)
>>>>> into
>>>>> Hyracks, which gives 10-20% performance improvements on random data. It
>>>>> will be more useful if the input data is partially sorted, e.g.,
>>>>> primary
>>>>> keys fetched from secondary index scan, which I haven't got time to
>>>>> experiment with.
>>>>>
>>>>> *Before going any further, is it legal to adapt some algorithm
>>>>> implementation from JDK into our codebase? *I saw the JDK
>>>>> implementation
>>>>> itself is adopted from
>>>>> http://svn.python.org/projects/python/trunk/Objects/listsort.txt as
>>>>> well.
>>>>>
>>>>> Best regards,
>>>>> Chen Luo
>>>>>
>>>>>
>


Re: Adapting TimSort into AsterixDB/Hyracks

2017-10-28 Thread Chen Luo
Not exactly sure about this right now... But since TimSort is essentially a
combination of insertion sort and merge sort, it's cache-friendliness won't
be worse than our merge sort.

This TimSort could be served as a short-term plugin-and-play improvement of
our sorting algorithm. It is still stable, the same as our current merge
sort, but faster, especially on partially ordered dataset.

Best regards,
Chen Luo

On Sat, Oct 28, 2017 at 10:58 AM, Mike Carey <dtab...@gmail.com> wrote:

> How is it on cache-friendliness?
>
>
>
> On 10/27/17 11:38 PM, abdullah alamoudi wrote:
>
>> While I have no answer to the question of legality, this sounds great.
>>
>> ~Abdullah.
>>
>> On Oct 27, 2017, at 9:20 PM, Chen Luo <cl...@uci.edu> wrote:
>>>
>>> Hi devs,
>>>
>>> I have adapted the TimSort algorithm used in JDK (java.util.TimSort) into
>>> Hyracks, which gives 10-20% performance improvements on random data. It
>>> will be more useful if the input data is partially sorted, e.g., primary
>>> keys fetched from secondary index scan, which I haven't got time to
>>> experiment with.
>>>
>>> *Before going any further, is it legal to adapt some algorithm
>>> implementation from JDK into our codebase? *I saw the JDK implementation
>>> itself is adopted from
>>> http://svn.python.org/projects/python/trunk/Objects/listsort.txt as
>>> well.
>>>
>>> Best regards,
>>> Chen Luo
>>>
>>
>


Re: Re: Adapting TimSort into AsterixDB/Hyracks

2017-10-28 Thread Chen Luo
Thanks guys! I think the one I'm using is identical to the one pointed by
Wail (https://github.com/retrostreams/android-retrostreams/blob/master/src/
main/java/java9/util/TimSort.java), and also identical to the one used by
Spark. All these three implementations are credit to the same person Josh
Bloch.

Then we're good!

Best regards,
Chen Luo

On Sat, Oct 28, 2017 at 9:52 AM, Xikui Wang <xik...@uci.edu> wrote:

> I think probably it's not okay to use JDK's implementation due to the
> license issue [1]. (Search for BCL)
>
> Alternatively, you could use the one that Spark used from the Android
> Opensource Project, as Wail pointed out. That one is under Apache License.
>
> [1] https://www.apache.org/legal/resolved.html
>
> On Sat, Oct 28, 2017 at 9:44 AM, Wail Alkowaileet <wael@gmail.com>
> wrote:
>
> > P.S Spark implementation is under Apache.
> >
> > On Sat, Oct 28, 2017 at 9:41 AM, Wail Alkowaileet <wael@gmail.com>
> > wrote:
> >
> > > Android has an implementation:
> > > https://github.com/retrostreams/android-retrostreams/blob/master/src/
> > > main/java/java9/util/TimSort.java
> > >
> > > Spark has ported it
> > > https://github.com/apache/spark/blob/master/core/src/
> > > main/java/org/apache/spark/util/collection/TimSort.java
> > >
> > > We can customize it for AsterixDB comparators.
> > >
> > > On Sat, Oct 28, 2017 at 9:28 AM, Chen Luo <cl...@uci.edu> wrote:
> > >
> > >> I don't know whether there is an easy way for us to directly reuse
> > TimSort
> > >> in JDK since it's design for sorting objects in an Array, while in
> > Hyracks
> > >> we don't have explicit object creation and sort everything in main
> > memory.
> > >> So what I did is that I copied it's source code, and replaced all
> object
> > >> assignments/swaps/comparisons using Hyracks in-memory operations.
> > >>
> > >> Best regards,
> > >> Chen Luo
> > >>
> > >> On Sat, Oct 28, 2017 at 12:07 AM, 李文海 <8...@whu.edu.cn> wrote:
> > >>
> > >> > I believe reusing jdk afap could be better. btw, timsort is better
> > than
> > >> > others by 1x when records are locally ordered .
> > >> > best
> > >> >
> > >> > 在 2017-10-28 14:38:21,"abdullah alamoudi" <bamou...@gmail.com> 写道:
> > >> >
> > >> > >While I have no answer to the question of legality, this sounds
> > great.
> > >> > >
> > >> > >~Abdullah.
> > >> > >
> > >> > >> On Oct 27, 2017, at 9:20 PM, Chen Luo <cl...@uci.edu> wrote:
> > >> > >>
> > >> > >> Hi devs,
> > >> > >>
> > >> > >> I have adapted the TimSort algorithm used in JDK
> > (java.util.TimSort)
> > >> > into
> > >> > >> Hyracks, which gives 10-20% performance improvements on random
> > data.
> > >> It
> > >> > >> will be more useful if the input data is partially sorted, e.g.,
> > >> primary
> > >> > >> keys fetched from secondary index scan, which I haven't got time
> to
> > >> > >> experiment with.
> > >> > >>
> > >> > >> *Before going any further, is it legal to adapt some algorithm
> > >> > >> implementation from JDK into our codebase? *I saw the JDK
> > >> implementation
> > >> > >> itself is adopted from
> > >> > >> http://svn.python.org/projects/python/trunk/Objects/listsort.txt
> > as
> > >> > well.
> > >> > >>
> > >> > >> Best regards,
> > >> > >> Chen Luo
> > >> > >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > *Regards,*
> > > Wail Alkowaileet
> > >
> >
> >
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
> >
>


Re: Re: Adapting TimSort into AsterixDB/Hyracks

2017-10-28 Thread Chen Luo
I don't know whether there is an easy way for us to directly reuse TimSort
in JDK since it's design for sorting objects in an Array, while in Hyracks
we don't have explicit object creation and sort everything in main memory.
So what I did is that I copied it's source code, and replaced all object
assignments/swaps/comparisons using Hyracks in-memory operations.

Best regards,
Chen Luo

On Sat, Oct 28, 2017 at 12:07 AM, 李文海 <8...@whu.edu.cn> wrote:

> I believe reusing jdk afap could be better. btw, timsort is better than
> others by 1x when records are locally ordered .
> best
>
> 在 2017-10-28 14:38:21,"abdullah alamoudi" <bamou...@gmail.com> 写道:
>
> >While I have no answer to the question of legality, this sounds great.
> >
> >~Abdullah.
> >
> >> On Oct 27, 2017, at 9:20 PM, Chen Luo <cl...@uci.edu> wrote:
> >>
> >> Hi devs,
> >>
> >> I have adapted the TimSort algorithm used in JDK (java.util.TimSort)
> into
> >> Hyracks, which gives 10-20% performance improvements on random data. It
> >> will be more useful if the input data is partially sorted, e.g., primary
> >> keys fetched from secondary index scan, which I haven't got time to
> >> experiment with.
> >>
> >> *Before going any further, is it legal to adapt some algorithm
> >> implementation from JDK into our codebase? *I saw the JDK implementation
> >> itself is adopted from
> >> http://svn.python.org/projects/python/trunk/Objects/listsort.txt as
> well.
> >>
> >> Best regards,
> >> Chen Luo
> >
>
>


Adapting TimSort into AsterixDB/Hyracks

2017-10-27 Thread Chen Luo
Hi devs,

I have adapted the TimSort algorithm used in JDK (java.util.TimSort) into
Hyracks, which gives 10-20% performance improvements on random data. It
will be more useful if the input data is partially sorted, e.g., primary
keys fetched from secondary index scan, which I haven't got time to
experiment with.

*Before going any further, is it legal to adapt some algorithm
implementation from JDK into our codebase? *I saw the JDK implementation
itself is adopted from
http://svn.python.org/projects/python/trunk/Objects/listsort.txt as well.

Best regards,
Chen Luo


Re: Question about AsterixDB Functions

2017-09-28 Thread Chen Luo
NVM, I think I can implement a customized aggregate function for this
purpose (though it's semantics is slightly different from traditional SQL
aggregate functions which only produce one result value). Please correct me
if my understanding is wrong...

On Wed, Sep 27, 2017 at 8:43 PM, Chen Luo <cl...@uci.edu> wrote:

> Hi Devs,
>
> Recently I was facing a problem with the IntersectOperator. Previously
> we've used the IntersectOperator to intersects primary keys returned from
> searching multiple secondary indexes, and use them to perform primary key
> lookups. However, with component Id-based acceleration, each primary key
> returned from a secondary index would carry a component Id (which is two
> numbers). Thus, inside the IntersectOperator, we only intersect the primary
> keys, while need to select a proper component Id based on these inputs. For
> example, for 3 input tuples (a, [1, 3]), (a, [0, 2]) and (a, [2, 2]), where
> 'a' is the primary key and the interval is the component Id, we may return
> (a, [2, 2]) as the output of the intersection.
>
> Thus, my question is that *is there any function interface inside
> AsterixDB which takes a list of input tuples and produce a tuple as a
> result*? With this functionality, we can devise strategies to select the
> best component Id for each primary key. Any help is appreciated!
>
> Best regards,
> Chen Luo
>


Question about AsterixDB Functions

2017-09-27 Thread Chen Luo
Hi Devs,

Recently I was facing a problem with the IntersectOperator. Previously
we've used the IntersectOperator to intersects primary keys returned from
searching multiple secondary indexes, and use them to perform primary key
lookups. However, with component Id-based acceleration, each primary key
returned from a secondary index would carry a component Id (which is two
numbers). Thus, inside the IntersectOperator, we only intersect the primary
keys, while need to select a proper component Id based on these inputs. For
example, for 3 input tuples (a, [1, 3]), (a, [0, 2]) and (a, [2, 2]), where
'a' is the primary key and the interval is the component Id, we may return
(a, [2, 2]) as the output of the intersection.

Thus, my question is that *is there any function interface inside AsterixDB
which takes a list of input tuples and produce a tuple as a result*? With
this functionality, we can devise strategies to select the best component
Id for each primary key. Any help is appreciated!

Best regards,
Chen Luo


Verify fails on SQLPP Execution Test

2017-06-08 Thread Chen Luo
Hi devs,

After submitting a patch today, I saw the verification fails because of the
SQLPP Execution Tests. It seems to me that the test fails because of some
char set issue when testing inverted index, e.g.,:

< { "id": 90, "title": "VORTEX  Video Retrieval and Tracking from
Compressed Multimedia Databases ¾ Visual Search Engine." }
> { "id": 90, "title": "VORTEX  Video Retrieval and Tracking from Compressed 
> Multimedia Databases ? Visual Search Engine." }


Any suggestions to fix this? Thanks.


Best regards,

Chen Luo