Re: /rest/jobs visualization

2018-10-23 Thread Jianfeng Jia
I remember I was using a chrome extension, e.g.,
https://chrome.google.com/webstore/detail/json-viewer/gbmdgpbipfallnflgajpaliibnhdgobh?hl=en-US,
to navigate the JSON record.

On Tue, Oct 23, 2018 at 8:51 PM Ian Maxon  wrote:

> Hey guys,
> Does anyone have any ways right now of visualizing the output of the
> JSON form of the job-run and job-activity-graph ? I know we used to
> have a servlet for it, but since we got rid of that, I know it's been
> asked about so I assume it must be used somehow at the moment...
> Thanks,
> - Ian
>


-- 

-
Best Regards

Jianfeng Jia
Ph.D. of Computer Science
University of California, Irvine


Re: Searching for duplicates during feed ingestion.

2017-05-08 Thread Jianfeng Jia
Got the point now…
I would image If the record has a version number that could potentially solve 
some problems here. However, it would be a totally difference story then..

> On May 8, 2017, at 12:39 PM, Mike Carey <dtab...@gmail.com> wrote:
> 
> Note that upserts don't avoid searches (Still need to get the old record 
> to update secondary indexes from.)
> 
> 
> On 5/8/17 12:10 PM, Jianfeng Jia wrote:
>> Aha, never knew that before. We will definitely try upsert feed next time! 
>> Thanks for pointing it out!
>> 
>>> On May 8, 2017, at 12:07 PM, Ildar Absalyamov <ildar.absalya...@gmail.com> 
>>> wrote:
>>> 
>>> I believe we already support upsert feeds ;)
>>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql
>>>  
>>> <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql>
>>>> On May 8, 2017, at 12:04, Jianfeng Jia <jianfeng@gmail.com> wrote:
>>>> 
>>>> I also observe this getting slower problem every-time when we re-ingest 
>>>> the twitter data. One difference is that the duplicate key could happen, 
>>>> and we know that is indeed duplicate record. To skip the search, we would 
>>>> expect an  “upsert” logic ( just replace the old one :-) ) instead of an 
>>>> insert.
>>>> 
>>>> Then maybe we can add some configuration in feed configuration like
>>>> 
>>>> create feed MessageFeed using localfs(
>>>> ("format"="adm"),
>>>> ("type-name"="typeX"),
>>>> ("upsert"="true")
>>>> );
>>>> 
>>>> to indicate that this feed using the upsert logic instead of insert.
>>>> 
>>>> One thing we need to confirm is that if “upsert” is actually implemented 
>>>> in a no-search fashion?
>>>> Based on the way we searching the components, only the most recent one 
>>>> will be popped out. Then blindly insert should be OK logically. Correct me 
>>>> if I missed some other cases (highly likely :-)).
>>>> 
>>>> 
>>>>> On May 8, 2017, at 11:05 AM, Mike Carey <dtab...@gmail.com> wrote:
>>>>> 
>>>>> +0.99 from me.
>>>>> 
>>>>> 
>>>>> On 5/8/17 9:50 AM, Taewoo Kim wrote:
>>>>>> +1 for auto-generated ID case
>>>>>> 
>>>>>> Best,
>>>>>> Taewoo
>>>>>> 
>>>>>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <buyin...@gmail.com> wrote:
>>>>>> 
>>>>>>> Abdullah has a pending change that disables searches if there's no
>>>>>>> secondary indexes [1].
>>>>>>> Auto-generated ID could be another case for which we can disable 
>>>>>>> searches
>>>>>>> as well.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Yingyi
>>>>>>> 
>>>>>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <wael@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Devs,
>>>>>>>> 
>>>>>>>> I'm noticing a behavior during the ingestion is that it's getting 
>>>>>>>> slower
>>>>>>> by
>>>>>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm
>>>>>>>> seeing is that I can notice the drop in ingestion rate roughly after
>>>>>>> having
>>>>>>>> 10 components (around ~13 GB). That's what I'm not sure if it's 
>>>>>>>> expected?
>>>>>>>> 
>>>>>>>> I tried multiple setups (increasing Memory component size +
>>>>>>>> max-mergable-component-size). All of which delayed the problem but not
>>>>>>>> solved it. The only part I've never changed is the bloom-filter
>>>>>>>> false-positive rate (1%). Which I want to investigate next.
>>>>>>>> 
>>>>>>>> So..
>>>>>>>> What I want to suggest is that when the primary key is auto-generated,
>>>>>>> why
>>>>>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me.
>>>>>>> Also,
>>>>>>>> can we give the user the ability to tell the index that all keys are
>>>>>>> unique
>>>>>>>> ? I know I should not trust the user .. but in certain cases, probably
>>>>>>> the
>>>>>>>> user is certain that the key is unique. Or a more elegant solution can
>>>>>>>> shine in the end :-)
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> *Regards,*
>>>>>>>> Wail Alkowaileet
>>>>>>>> 
>>> Best regards,
>>> Ildar
>>> 
> 



Re: Searching for duplicates during feed ingestion.

2017-05-08 Thread Jianfeng Jia
Aha, never knew that before. We will definitely try upsert feed next time! 
Thanks for pointing it out!

> On May 8, 2017, at 12:07 PM, Ildar Absalyamov <ildar.absalya...@gmail.com> 
> wrote:
> 
> I believe we already support upsert feeds ;)
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql
>  
> <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql>
>> On May 8, 2017, at 12:04, Jianfeng Jia <jianfeng@gmail.com> wrote:
>> 
>> I also observe this getting slower problem every-time when we re-ingest the 
>> twitter data. One difference is that the duplicate key could happen, and we 
>> know that is indeed duplicate record. To skip the search, we would expect an 
>>  “upsert” logic ( just replace the old one :-) ) instead of an insert. 
>> 
>> Then maybe we can add some configuration in feed configuration like
>> 
>> create feed MessageFeed using localfs(
>> ("format"="adm"),
>> ("type-name"="typeX"),
>> ("upsert"="true")
>> );
>> 
>> to indicate that this feed using the upsert logic instead of insert. 
>> 
>> One thing we need to confirm is that if “upsert” is actually implemented in 
>> a no-search fashion? 
>> Based on the way we searching the components, only the most recent one will 
>> be popped out. Then blindly insert should be OK logically. Correct me if I 
>> missed some other cases (highly likely :-)).
>> 
>> 
>>> On May 8, 2017, at 11:05 AM, Mike Carey <dtab...@gmail.com> wrote:
>>> 
>>> +0.99 from me.
>>> 
>>> 
>>> On 5/8/17 9:50 AM, Taewoo Kim wrote:
>>>> +1 for auto-generated ID case
>>>> 
>>>> Best,
>>>> Taewoo
>>>> 
>>>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <buyin...@gmail.com> wrote:
>>>> 
>>>>> Abdullah has a pending change that disables searches if there's no
>>>>> secondary indexes [1].
>>>>> Auto-generated ID could be another case for which we can disable searches
>>>>> as well.
>>>>> 
>>>>> Best,
>>>>> Yingyi
>>>>> 
>>>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/
>>>>> 
>>>>> 
>>>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <wael@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Devs,
>>>>>> 
>>>>>> I'm noticing a behavior during the ingestion is that it's getting slower
>>>>> by
>>>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm
>>>>>> seeing is that I can notice the drop in ingestion rate roughly after
>>>>> having
>>>>>> 10 components (around ~13 GB). That's what I'm not sure if it's expected?
>>>>>> 
>>>>>> I tried multiple setups (increasing Memory component size +
>>>>>> max-mergable-component-size). All of which delayed the problem but not
>>>>>> solved it. The only part I've never changed is the bloom-filter
>>>>>> false-positive rate (1%). Which I want to investigate next.
>>>>>> 
>>>>>> So..
>>>>>> What I want to suggest is that when the primary key is auto-generated,
>>>>> why
>>>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me.
>>>>> Also,
>>>>>> can we give the user the ability to tell the index that all keys are
>>>>> unique
>>>>>> ? I know I should not trust the user .. but in certain cases, probably
>>>>> the
>>>>>> user is certain that the key is unique. Or a more elegant solution can
>>>>>> shine in the end :-)
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> *Regards,*
>>>>>> Wail Alkowaileet
>>>>>> 
>>> 
>> 
> 
> Best regards,
> Ildar
> 



Re: Searching for duplicates during feed ingestion.

2017-05-08 Thread Jianfeng Jia
I also observe this getting slower problem every-time when we re-ingest the 
twitter data. One difference is that the duplicate key could happen, and we 
know that is indeed duplicate record. To skip the search, we would expect an  
“upsert” logic ( just replace the old one :-) ) instead of an insert. 

Then maybe we can add some configuration in feed configuration like

create feed MessageFeed using localfs(
("format"="adm"),
("type-name"="typeX"),
("upsert"="true")
);

to indicate that this feed using the upsert logic instead of insert. 

One thing we need to confirm is that if “upsert” is actually implemented in a 
no-search fashion? 
Based on the way we searching the components, only the most recent one will be 
popped out. Then blindly insert should be OK logically. Correct me if I missed 
some other cases (highly likely :-)).
 

> On May 8, 2017, at 11:05 AM, Mike Carey  wrote:
> 
> +0.99 from me.
> 
> 
> On 5/8/17 9:50 AM, Taewoo Kim wrote:
>> +1 for auto-generated ID case
>> 
>> Best,
>> Taewoo
>> 
>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu  wrote:
>> 
>>> Abdullah has a pending change that disables searches if there's no
>>> secondary indexes [1].
>>> Auto-generated ID could be another case for which we can disable searches
>>> as well.
>>> 
>>> Best,
>>> Yingyi
>>> 
>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/
>>> 
>>> 
>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet 
>>> wrote:
>>> 
 Hi Devs,
 
 I'm noticing a behavior during the ingestion is that it's getting slower
>>> by
 time. I know that is an expected behavior in LSM-indexes. But what I'm
 seeing is that I can notice the drop in ingestion rate roughly after
>>> having
 10 components (around ~13 GB). That's what I'm not sure if it's expected?
 
 I tried multiple setups (increasing Memory component size +
 max-mergable-component-size). All of which delayed the problem but not
 solved it. The only part I've never changed is the bloom-filter
 false-positive rate (1%). Which I want to investigate next.
 
 So..
 What I want to suggest is that when the primary key is auto-generated,
>>> why
 AsterixDB looks for duplicates? it seems a wasteful operation to me.
>>> Also,
 can we give the user the ability to tell the index that all keys are
>>> unique
 ? I know I should not trust the user .. but in certain cases, probably
>>> the
 user is certain that the key is unique. Or a more elegant solution can
 shine in the end :-)
 
 --
 
 *Regards,*
 Wail Alkowaileet
 
> 



Re: What is the new path to check Hyracks jobs status in AsterixDB?

2017-04-07 Thread Jianfeng Jia
That’s a good idea. I haven’t think about the browser plugin. Now it looks 
better!

> On Apr 7, 2017, at 4:34 PM, Till Westmann <ti...@apache.org> wrote:
> 
> Since the endpoints return JSON, using a JSON formatter plugin for the 
> browser seems easier.
> Otherwise I think that we’ll need to create a page around it (which is 
> clearly feasible as well).
> 
> On 7 Apr 2017, at 16:21, Mike Carey wrote:
> 
>> Could we use the same library that Xikui used for JSON (formatted) as a baby 
>> step?
>> 
>> 
>> On 4/7/17 10:04 AM, Jianfeng Jia wrote:
>>> Got it. (do we have any plan to beautify the UI? :-)
>>> Thanks!
>>> 
>>>> On Apr 7, 2017, at 9:46 AM, Yingyi Bu <buyin...@gmail.com> wrote:
>>>> 
>>>> Hi Jianfeng,
>>>> 
>>>> The admin console has been removed but the REST APIs which return JSON
>>>> results are still there.
>>>> 
>>>> Let's take the sample cluster as an example.
>>>> To check nodes:
>>>> http://localhost:16001/rest/nodes/
>>>> http://localhost:16001/rest/nodes/red
>>>> http://localhost:16001/rest/nodes/blue
>>>> 
>>>> To check jobs:
>>>> http://localhost:16001/rest/jobs/
>>>> http://localhost:16001/rest/jobs/JID:0/job-run
>>>> 
>>>> Best,
>>>> Yingyi
>>>> 
>>>> 
>>>> On Thu, Apr 6, 2017 at 5:18 PM, Jianfeng Jia <jianfeng@gmail.com> 
>>>> wrote:
>>>> 
>>>>> Dear Devs,
>>>>> 
>>>>> We used to have a Hyracks adminconsole web page xxx:/adminconsole  (or
>>>>> on 16001 port if not using managix) which can watch the details of the
>>>>> recent jobs. By click into each job we can know Activity Cluster Graph/Job
>>>>> Timeline etc.
>>>>> 
>>>>> It’s very useful to have a overview of the current system workload (e.g.,
>>>>> how many queries are running, when did they submit, how long it has ran 
>>>>> …).
>>>>> Right now, the same link returns a following error.
>>>>> page can’t be found
>>>>> 
>>>>> I’m wondering what is the new path to get the same information? Thanks!
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>> 
>>>>> 



Re: What is the new path to check Hyracks jobs status in AsterixDB?

2017-04-07 Thread Jianfeng Jia
Got it. (do we have any plan to beautify the UI? :-)
Thanks!

> On Apr 7, 2017, at 9:46 AM, Yingyi Bu <buyin...@gmail.com> wrote:
> 
> Hi Jianfeng,
> 
> The admin console has been removed but the REST APIs which return JSON
> results are still there.
> 
> Let's take the sample cluster as an example.
> To check nodes:
> http://localhost:16001/rest/nodes/
> http://localhost:16001/rest/nodes/red
> http://localhost:16001/rest/nodes/blue
> 
> To check jobs:
> http://localhost:16001/rest/jobs/
> http://localhost:16001/rest/jobs/JID:0/job-run
> 
> Best,
> Yingyi
> 
> 
> On Thu, Apr 6, 2017 at 5:18 PM, Jianfeng Jia <jianfeng@gmail.com> wrote:
> 
>> Dear Devs,
>> 
>> We used to have a Hyracks adminconsole web page xxx:/adminconsole  (or
>> on 16001 port if not using managix) which can watch the details of the
>> recent jobs. By click into each job we can know Activity Cluster Graph/Job
>> Timeline etc.
>> 
>> It’s very useful to have a overview of the current system workload (e.g.,
>> how many queries are running, when did they submit, how long it has ran …).
>> Right now, the same link returns a following error.
>> page can’t be found
>> 
>> I’m wondering what is the new path to get the same information? Thanks!
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 



What is the new path to check Hyracks jobs status in AsterixDB?

2017-04-06 Thread Jianfeng Jia
Dear Devs,

We used to have a Hyracks adminconsole web page xxx:/adminconsole  (or on 
16001 port if not using managix) which can watch the details of the recent 
jobs. By click into each job we can know Activity Cluster Graph/Job Timeline 
etc. 

It’s very useful to have a overview of the current system workload (e.g., how 
many queries are running, when did they submit, how long it has ran …). 
Right now, the same link returns a following error.
page can’t be found

I’m wondering what is the new path to get the same information? Thanks!



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Re: Add the Java driver for Asterix

2017-03-03 Thread Jianfeng Jia
Hi Jabbar,

It’s nice that you are interested in the project.

@devs, here is the issue I posted for GSOC2016 
https://issues.apache.org/jira/browse/ASTERIXDB-1369 
<https://issues.apache.org/jira/browse/ASTERIXDB-1369> , and I manually updated 
to GSOC2017. 
I feel the need, but I’m not an expert on JDBC. Do you think it’s a good 
project? and will anyone interested in give the detailed instruction on it? 
Otherwise, I will close this issue to give it a very low priority. 

> On Mar 3, 2017, at 10:19 AM, jabbar memon <memonjabb...@gmail.com> wrote:
> 
> Hi Jianfeng Jia,
> i am postgraduate at Dhirubhai Ambani Institute of Information and
> Communication technology,Ahmedabad.I'd like to contribute to this project
> in the GSOC 2017. I have a good knowledge about Java. I'd like to know more
> about this project.And it is new for me so it will be great challenging and
> excitement.
> 
> Thanks
> Jabbar Memon



Re: Choosing defaults for AsterixDB

2017-02-03 Thread Jianfeng Jia
Hi, I want to pick up this thread to verify if the AQL will still be supported 
in the future? 
Currently, Cloudberry automatically translates the JSON request to AQL 
statements. It will be a hard work to switch to SQL++. 

I’m not object to set the default option to SQL++. However, we will keep the 
support for AQL, right?  (especially in the RESTFull API).

> On Jan 10, 2017, at 5:47 PM, Till Westmann  wrote:
> 
> Ok, since there’s a lot of agreement and no concerns, I’ll go ahead.
> 
> Thanks,
> Till
> 
> On 10 Jan 2017, at 9:22, Yingyi Bu wrote:
> 
>> +100!
>> 
>> On Tue, Jan 10, 2017 at 9:17 AM, Mike Carey  wrote:
>> 
>>> +1 from me too for SQL++ and clean JSON.
>>> 
>>> 
>>> 
>>> On 1/10/17 8:25 AM, Murtadha Hubail wrote:
>>> 
 +1 to SQL++ and clean JSON.
 
 Cheers,
 Murtadha
 
 On Jan 10, 2017, at 9:46 AM, Till Westmann  wrote:
> 
> Hi,
> 
> as you know AsterixDB supports 2 query languages (AQL and SQL++) and many
> output formats (ADM, clean JSON, lossless JSON, CSV). Our current
> defaults
> for these options (at least on the web interface) are AQL and ADM.
> 
> I’d like to propose to change those defaults to be SQL++ and (clean)
> JSON.
> The reason for wanting them to change, is that I think that these choices
> are more attractive to new users of the system and thus can help to
> increase
> the adoption of AsterixDB. A user with some database experience is much
> more
> likely to have previous SQL experience and to feel at home with SQL++
> than
> having XQuery experience and feeling at home with AQL. Similarly, most
> users
> will want to use the data that they get out of AsterixDB in an
> application
> and it will be a lot easier to consume JSON than it is to consume ADM.
> 
> I've prepared a (tiny) change to change the defaults [1] and I'm
> wondering
> if there are concerns that should keep us from making this change.
> 
> Cheers,
> Till
> 
> [1] https://asterix-gerrit.ics.uci.edu/#/c/1409/
> 
 
>>> 



Re: [VOTE] Release Apache AsterixDB 0.9.0 and Hyracks 0.3.0 (RC2)

2017-01-21 Thread Jianfeng Jia
+1

- signatures and hash checks
- source compilation works
- nc service works

> On Jan 21, 2017, at 6:03 PM, Yingyi Bu  wrote:
> 
> +1
> 
> - signatures and hashes of all 5 archives ok
> - nc service binary works
> - version api agrees with the commit id on ASF repo
> - source compilation works
> 
> Best,
> Yingyi
> 
> 
> On Sat, Jan 21, 2017 at 9:16 AM, Steven Jacobs  wrote:
> 
>> +1
>> Steven
>> 
>> On Sat, Jan 21, 2017 at 7:36 AM Till Westmann  wrote:
>> 
>>> +1
>>> 
>>> 
>>> 
>>> - signature and hashes of all 5 archives ok
>>> 
>>> - source archives agree with commit ids
>>> 
>>> - LICENSE + NOTICE look good for all archives
>>> 
>>> - source files have headers
>>> 
>>> - no unexpected binaries
>>> 
>>> - compilation works
>>> 
>>> 
>>> 
>>> Till
>>> 
>>> 
>>> 
>>> On 19 Jan 2017, at 4:50, Ian Maxon wrote:
>>> 
>>> 
>>> 
 Hi again everyone,
>>> 
 
>>> 
 Please verify and vote on the first non-incubating Apache AsterixDB
>>> 
 Release!
>>> 
 This 2nd RC addresses build issues noticed in the previous RC, along
>>> 
 with
>>> 
 some minor license tweaks.
>>> 
 This release utilizes a series of improvements around the actual
>>> 
 release
>>> 
 process that will hopefully shorten the interval between releases. A
>>> 
 further email detailing the features contained in this release as
>>> 
 compared
>>> 
 to the previous incubating release will be forthcoming once a suitable
>>> 
 RC
>>> 
 passes voting.
>>> 
 
>>> 
 The tags to be voted on are:
>>> 
 
>>> 
 apache-asterixdb-0.9.0-rc2
>>> 
 commit: 4383bdde78c02d597be65ecf467c5a7df85a2055
>>> 
 link:
>>> 
 
>>> https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;a=
>> tag;h=refs/tags/apache-asterixdb-0.9.0-rc2
>>> 
 
>>> 
 and
>>> 
 
>>> 
 apache-hyracks-0.3.0-rc2
>>> 
 commit: def643d586b62b2616b8ab8e6fc3ba598cf5ad67
>>> 
 link:
>>> 
 
>>> https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;a=
>> tag;h=refs/tags/apache-hyracks-0.3.0-rc2
>>> 
 
>>> 
 The artifacts, sha1's, and signatures are (for each artifact), are at:
>>> 
 
>>> 
 AsterixDB Source
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/apache-
>>> 
 asterixdb-0.9.0-source-release.zip
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/apache-
>>> 
 asterixdb-0.9.0-source-release.zip.asc
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/apache-
>>> 
 asterixdb-0.9.0-source-release.zip.sha1
>>> 
 
>>> 
 SHA1: 49f8df822c6273a310027d3257a79afb45c8d446
>>> 
 
>>> 
 Hyracks Source
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/apache-
>>> 
 hyracks-0.3.0-source-release.zip
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/apache-
>>> 
 hyracks-0.3.0-source-release.zip.asc
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/apache-
>>> 
 hyracks-0.3.0-source-release.zip.sha1
>>> 
 
>>> 
 SHA1: 4d042cab164347f0cc5cc1cfb3da8d4f02eea1de
>>> 
 
>>> 
 AsterixDB NCService Installer:
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 server-0.9.0-binary-assembly.zip
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 server-0.9.0-binary-assembly.zip.asc
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 server-0.9.0-binary-assembly.zip.sha1
>>> 
 
>>> 
 SHA1: 46c4cc3dc09e915d4b1bc6f912faef389488fdb6
>>> 
 
>>> 
 AsterixDB Managix Installer
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 installer-0.9.0-binary-assembly.zip
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 installer-0.9.0-binary-assembly.zip.asc
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 installer-0.9.0-binary-assembly.zip.sha1
>>> 
 
>>> 
 SHA1: 41497dbadb0ad281ba0a10ee87eaa5f7afa78cef
>>> 
 
>>> 
 AsterixDB YARN Installer
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 yarn-0.9.0-binary-assembly.zip
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 yarn-0.9.0-binary-assembly.zip.asc
>>> 
 https://dist.apache.org/repos/dist/dev/asterixdb/asterix-
>>> 
 yarn-0.9.0-binary-assembly.zip.sha1
>>> 
 
>>> 
 SHA1: 3ade0d2957e7f3e465e357aced6712ef72598613
>>> 
 
>>> 
 Additionally, a staged maven repository is available at:
>>> 
 
>>> 
 
>>> https://repository.apache.org/content/repositories/
>> orgapacheasterix-1024/
>>> 
 
>>> 
 The KEYS file containing the PGP keys used to sign the release can be
>>> 
 found at
>>> 
 
>>> 
 https://dist.apache.org/repos/dist/release/asterixdb/KEYS
>>> 
 
>>> 
 RAT was executed as part of Maven via the RAT maven plugin, but
>>> 
 excludes files that are:
>>> 
 
>>> 
 - data for tests

Re: How to set the lsm component size?

2016-10-13 Thread Jianfeng Jia
Nice!
> On Oct 13, 2016, at 2:56 PM, Taewoo Kim <wangs...@gmail.com> wrote:
> 
> The explanation changes for the two parameters have been merged into the
> master.
> 
> https://asterix-gerrit.ics.uci.edu/#/c/1281/3/asterixdb/asterix-installer/src/main/resources/conf/asterix-configuration.xml
> 
> Best,
> Taewoo
> 
> On Mon, Sep 12, 2016 at 5:02 PM, Taewoo Kim <wangs...@gmail.com> wrote:
> 
>> Thanks to Sattam, here is the revised version. Feel free to revise this. I
>> will upload a patch set after some revision is done.
>> 
>> *storage.memorycomponent.numpages*
>> 
>> The number of pages to allocate for a memory component. (Default = 256)
>> This budget is shared by all the memory components of the primary index
>> and all its secondary indexes across all I/O devices on a node.
>> Note: in-memory components usually has fill factor of 75% since the pages
>> are 75% full and the remaining 25% is un-utilized.
>> 
>> 
>> *storage.memorycomponent.globalbudget*
>> 
>> [4GB + 100MB] The total size of memory in bytes that the sum of all open
>> memory components cannot exceed. (Default = 512MB)
>> Consider this as the buffer cache for all memory components of all indexes
>> in a node.
>> When this budget is fully used, a victim dataset will be chosen. It must
>> be evicted and closed to make a space for another dataset.
>> 
>> 
>> Best,
>> Taewoo
>> 
>> On Mon, Sep 12, 2016 at 4:10 PM, Mike Carey <dtab...@gmail.com> wrote:
>> 
>>> +1
>>> 
>>> 
>>> 
>>> On 9/12/16 3:42 PM, Taewoo Kim wrote:
>>> 
>>>> It would be really helpful this conversation can be applied in the
>>>> description of each parameter. Currently, I think that is too short.
>>>> 
>>>> Best,
>>>> Taewoo
>>>> 
>>>> On Mon, Sep 12, 2016 at 2:19 PM, Jianfeng Jia <jianfeng@gmail.com>
>>>> wrote:
>>>> 
>>>> Clear. Thanks.
>>>>> 
>>>>> And Ian’s parameters works. I can have a on-disk components around 128M.
>>>>> Thanks!
>>>>> 
>>>>> On Sep 12, 2016, at 12:50 PM, Sattam Alsubaiee <salsuba...@gmail.com>
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> This is the total memory size given for all datasets. Think of it as
>>>>>> the
>>>>>> buffer cache for all memory components of all indexes in that machine.
>>>>>> 
>>>>> When
>>>>> 
>>>>>> it is exhausted, a victim dataset must be evicted and closed to have a
>>>>>> space for another dataset.
>>>>>> 
>>>>>> On Mon, Sep 12, 2016 at 12:29 PM, Jianfeng Jia <jianfeng@gmail.com
>>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> I was a little confused, there is another configuration:
>>>>>>> 
>>>>>>> storage.memorycomponent.globalbudget ( which I set to 4G)
>>>>>>> 
>>>>>>> I was thinking this is the budget that every component on one
>>>>>>> partition
>>>>>>> 
>>>>>> is
>>>>> 
>>>>>> shared. Is that the case?
>>>>>>> 
>>>>>>> On Sep 12, 2016, at 12:16 PM, Sattam Alsubaiee <salsuba...@gmail.com>
>>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> The 128M is shared by all the memory components of the primary index
>>>>>>>> 
>>>>>>> and
>>>>> 
>>>>>> all its secondary indexes across all io devices on that node.
>>>>>>>> Also the in-memory components usually usually has fill factor of 75%
>>>>>>>> 
>>>>>>> since
>>>>>>> 
>>>>>>>> the pages are 75% full and the remaining 25% is un-utilized.
>>>>>>>> 
>>>>>>>> The page size that you have set 128KB looks reasonable for most
>>>>>>>> cases.
>>>>>>>> 
>>>>>>> Your
>>>>>>> 
>>>>>>>> best bet is to increase the value of storage.memorycomponent.numpage
>>>>>>>> 
>>>>>>> to
>>>>> 
>>>>>> a
>&g

Re: How to set the lsm component size?

2016-09-12 Thread Jianfeng Jia
Clear. Thanks.

And Ian’s parameters works. I can have a on-disk components around 128M. Thanks!

> On Sep 12, 2016, at 12:50 PM, Sattam Alsubaiee <salsuba...@gmail.com> wrote:
> 
> This is the total memory size given for all datasets. Think of it as the
> buffer cache for all memory components of all indexes in that machine. When
> it is exhausted, a victim dataset must be evicted and closed to have a
> space for another dataset.
> 
> On Mon, Sep 12, 2016 at 12:29 PM, Jianfeng Jia <jianfeng@gmail.com>
> wrote:
> 
>> I was a little confused, there is another configuration:
>> 
>> storage.memorycomponent.globalbudget ( which I set to 4G)
>> 
>> I was thinking this is the budget that every component on one partition is
>> shared. Is that the case?
>> 
>>> On Sep 12, 2016, at 12:16 PM, Sattam Alsubaiee <salsuba...@gmail.com>
>> wrote:
>>> 
>>> The 128M is shared by all the memory components of the primary index and
>>> all its secondary indexes across all io devices on that node.
>>> Also the in-memory components usually usually has fill factor of 75%
>> since
>>> the pages are 75% full and the remaining 25% is un-utilized.
>>> 
>>> The page size that you have set 128KB looks reasonable for most cases.
>> Your
>>> best bet is to increase the value of storage.memorycomponent.numpage to
>> a
>>> higher number.
>>> 
>>> Sattam
>>> 
>>> 
>>> On Mon, Sep 12, 2016 at 11:33 AM, Jianfeng Jia <jianfeng@gmail.com>
>>> wrote:
>>> 
>>>> Dear devs,
>>>> 
>>>> I’m using the `no-merge` compaction policy and find that the physical
>>>> flushed on-disk component is smaller than I was expected.
>>>> 
>>>> Here are my related configurations
>>>> 
>>>> 
>>>>   storage.memorycomponent.pagesize
>>>>   128KB
>>>>   The page size in bytes for pages allocated to memory
>>>> components. (Default = "131072" // 128KB)
>>>>   
>>>> 
>>>> 
>>>> 
>>>>   storage.memorycomponent.numpages
>>>>   1024
>>>>   The number of pages to allocate for a memory component.
>>>> (Default = 256)
>>>>   
>>>> 
>>>> 
>>>> With these two settings, I’m expecting the lsm component should be 128M.
>>>> However, the flushed one is about 16M~ 20M. Do we have some compression
>> for
>>>> the on-disk components? If so, it will be good. Otherwise, could someone
>>>> help me to increase the component size? Thanks!
>>>> 
>>>> Best,
>>>> 
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>> 
>>>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Re: How to set the lsm component size?

2016-09-12 Thread Jianfeng Jia
I was a little confused, there is another configuration:

storage.memorycomponent.globalbudget ( which I set to 4G) 

I was thinking this is the budget that every component on one partition is 
shared. Is that the case?

> On Sep 12, 2016, at 12:16 PM, Sattam Alsubaiee <salsuba...@gmail.com> wrote:
> 
> The 128M is shared by all the memory components of the primary index and
> all its secondary indexes across all io devices on that node.
> Also the in-memory components usually usually has fill factor of 75% since
> the pages are 75% full and the remaining 25% is un-utilized.
> 
> The page size that you have set 128KB looks reasonable for most cases. Your
> best bet is to increase the value of storage.memorycomponent.numpage to a
> higher number.
> 
> Sattam
> 
> 
> On Mon, Sep 12, 2016 at 11:33 AM, Jianfeng Jia <jianfeng@gmail.com>
> wrote:
> 
>> Dear devs,
>> 
>> I’m using the `no-merge` compaction policy and find that the physical
>> flushed on-disk component is smaller than I was expected.
>> 
>> Here are my related configurations
>> 
>>  
>>storage.memorycomponent.pagesize
>>128KB
>>The page size in bytes for pages allocated to memory
>>  components. (Default = "131072" // 128KB)
>>
>>  
>> 
>>  
>>storage.memorycomponent.numpages
>>1024
>>The number of pages to allocate for a memory component.
>>  (Default = 256)
>>
>>  
>> 
>> With these two settings, I’m expecting the lsm component should be 128M.
>> However, the flushed one is about 16M~ 20M. Do we have some compression for
>> the on-disk components? If so, it will be good. Otherwise, could someone
>> help me to increase the component size? Thanks!
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



How to set the lsm component size?

2016-09-12 Thread Jianfeng Jia
Dear devs,

I’m using the `no-merge` compaction policy and find that the physical flushed 
on-disk component is smaller than I was expected. 

Here are my related configurations 

  


storage.memorycomponent.pagesize
128KB
The page size in bytes for pages allocated to memory
  components. (Default = "131072" // 128KB)

  
 
  
storage.memorycomponent.numpages
1024
The number of pages to allocate for a memory component.
  (Default = 256)

  

With these two settings, I’m expecting the lsm component should be 128M. 
However, the flushed one is about 16M~ 20M. Do we have some compression for the 
on-disk components? If so, it will be good. Otherwise, could someone help me to 
increase the component size? Thanks!

Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Re: Creating RTree: no space left

2016-08-27 Thread Jianfeng Jia
I just tried and found there is not “plan” got printed for creating index. It 
seems just an operator. 
As Taewoo said, we need to check why Rtree creation generate so many runs. 

> On Aug 26, 2016, at 3:57 PM, Wail Alkowaileet <wael@gmail.com> wrote:
> 
> @Jianfeng: Sorry for the stupid questio. But it seems that the logs and the
> WebUI does not show the plan. Is there a flag for that?
> 
> @Taewoo: I'll look into it and see what's going on. AFAIK, the comparator
> is Hilbert.
> 
> On Fri, Aug 26, 2016 at 7:55 PM, Taewoo Kim <wangs...@gmail.com> wrote:
> 
>> Based on a rough calculation, per partition, each point field takes 3.6GB
>> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are
>> generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned
>> that there was no issue when creating a B+ tree index, we need to check
>> what SORT process is required by R-Tree index.
>> 
>> Best,
>> Taewoo
>> 
>> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <jianfeng@gmail.com>
>> wrote:
>> 
>>> If all of the file names start with “ExternalSortRunGenerator”, then they
>>> are the first round files which can not be GCed.
>>> Could you provide the query plan as well?
>>> 
>>>> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <wael@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Ian and Pouria,
>>>> 
>>>> The name of the files along with the sizes (there were 625 one of those
>>>> before crashing):
>>>> 
>>>> sizename
>>>> 96MB ExternalSortRunGenerator8917133039835449370.waf
>>>> 128MB   ExternalSortRunGenerator8948724728025392343.waf
>>>> 
>>>> no files were generated beyond runs.
>>>> compiler.sortmemory = 64MB
>>>> 
>>>> Here is the full logs
>>>> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_
>>> 25_07%3A34%3A52_AST_2016.zip?dl=0>
>>>> 
>>>> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <
>>> pouria.pirza...@gmail.com>
>>>> wrote:
>>>> 
>>>>> We previously had issues with huge spilled sort temp files when
>> creating
>>>>> inverted index for fuzzy queries, but NOT R-Trees.
>>>>> I also recall that Yingyi fixed the issue of delaying clean-up for
>>>>> intermediate temp files until the end of the query execution.
>>>>> If you can share names of a couple of temp files (and their sizes
>> along
>>>>> with the sort memory setting you have in asterix-configuration.xml) we
>>> may
>>>>> be able to have a better guess as if the sort is really going into a
>>>>> two-level merge or not.
>>>>> 
>>>>> Pouria
>>>>> 
>>>>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <ima...@uci.edu> wrote:
>>>>> 
>>>>>> I think that execption ("No space left on device") is just casted
>> from
>>>>> the
>>>>>> native IOException. Therefore I would be inclined to believe it's
>>>>> genuinely
>>>>>> out of space. I suppose the question is why the external sort is so
>>> huge.
>>>>>> What is the query plan? Maybe that will shed light on a possible
>> cause.
>>>>>> 
>>>>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <
>> wael@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> I was monitoring Inodes ... it didn't go beyond 1%.
>>>>>>> 
>>>>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <
>> wael@gmail.com
>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Chris and Mike,
>>>>>>>> 
>>>>>>>> Actually I was monitoring it to see what's going on:
>>>>>>>> 
>>>>>>>>  - The size of each partition is about 40GB (80GB in total per
>>>>>>>>  iodevice).
>>>>>>>>  - The runs took 157GB per iodevice (about 2x of the dataset
>> size).
>>>>>>>>  Each run takes either of 128MB or 96MB of storage.
>>>>>>>>  - At a certain time, there were 522 runs.
>>>>>>>> 
>>>>>>>> I even tried to create a BTree Index to see if that happens as
>> well.
>>>>&g

Re: Creating RTree: no space left

2016-08-26 Thread Jianfeng Jia
all
>>>> files,
>>>>>>> but worth checking.
>>>>>>> 
>>>>>>> If that's not it, then can you share the full exception and stack
>>>> trace?
>>>>>>> 
>>>>>>> Ceej
>>>>>>> aka Chris Hillery
>>>>>>> 
>>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <
>>> wael@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> I just cleared the hard drives to get 80% free space. I still get
>> the
>>>>>>>> same
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> The data contains:
>>>>>>>> 1- 2887453794 records.
>>>>>>>> 2- Schema:
>>>>>>>> 
>>>>>>>> create type CDRType as {
>>>>>>>> 
>>>>>>>> id:uuid,
>>>>>>>> 
>>>>>>>> 'date':string,
>>>>>>>> 
>>>>>>>> 'time':string,
>>>>>>>> 
>>>>>>>> 'duration':int64,
>>>>>>>> 
>>>>>>>> 'caller':int64,
>>>>>>>> 
>>>>>>>> 'callee':int64,
>>>>>>>> 
>>>>>>>> location:point?
>>>>>>>> 
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <
>>> wael@gmail.com
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Dears,
>>>>>>>>> 
>>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which
>> has
>>>>>>>>> 
>>>>>>>> 2x500GB
>>>>>>>> 
>>>>>>>>> SSD.
>>>>>>>>> 
>>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e
>>> the
>>>>>>>>> total is 4 iodevices per NC). After loading the data, each
>> Asterix
>>>>>>>>> partition occupied 31GB.
>>>>>>>>> 
>>>>>>>>> The cluster has about 50% free space in each hard drive
>>>> (approximately
>>>>>>>>> about 250GB free space in each hard drive). However, when I tried
>>> to
>>>>>>>>> 
>>>>>>>> create
>>>>>>>> 
>>>>>>>>> an index of type RTree, I got an exception that no space left in
>>> the
>>>>>>>>> hard
>>>>>>>>> drive during the External Sort phase.
>>>>>>>>> 
>>>>>>>>> Is that normal ?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> *Regards,*
>>>>>>>>> Wail Alkowaileet
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> *Regards,*
>>>>>>>> Wail Alkowaileet
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> *Regards,*
>>>>> Wail Alkowaileet
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> 
> *Regards,*
> Wail Alkowaileet



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Re: Wishlist

2016-06-18 Thread Jianfeng Jia
Yes, it doesn’t seem right. I’ve move it to the higher level. 

> On Jun 18, 2016, at 4:53 PM, Till Westmann <t...@westmann.org> wrote:
> 
> Hi Jianfeng,
> 
> it seems to me that the "wish list" [1] you started is not really a "design 
> doc".
> Should we pull it up one level?
> 
> Cheers,
> Till
> 
> [1] 
> https://cwiki.apache.org/confluence/display/ASTERIXDB/Things+will+be+easier+if+AsterixDB+have+these+features



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine



Remove the apache.org filter when using YourKit

2016-06-01 Thread Jianfeng Jia
Dear devs,

Don’t know if any of you hit the same problem as I did. When I use Yourkit to 
profile the system, it only showed that one method java.thread.run took 99% 
usage without further details , while using jvisualvm could give correct 
activity hotspot yet it’s very hard to set remotely. 

It is because the default filter in Yourkit skip the “org.apache” packages :-) 
To change the filter: click “settings” -> “filters” and click out the 
“org.apache”. Enjoy profiling apache code then!


Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine