Re: /rest/jobs visualization
I remember I was using a chrome extension, e.g., https://chrome.google.com/webstore/detail/json-viewer/gbmdgpbipfallnflgajpaliibnhdgobh?hl=en-US, to navigate the JSON record. On Tue, Oct 23, 2018 at 8:51 PM Ian Maxon wrote: > Hey guys, > Does anyone have any ways right now of visualizing the output of the > JSON form of the job-run and job-activity-graph ? I know we used to > have a servlet for it, but since we got rid of that, I know it's been > asked about so I assume it must be used somehow at the moment... > Thanks, > - Ian > -- - Best Regards Jianfeng Jia Ph.D. of Computer Science University of California, Irvine
Re: Searching for duplicates during feed ingestion.
Got the point now… I would image If the record has a version number that could potentially solve some problems here. However, it would be a totally difference story then.. > On May 8, 2017, at 12:39 PM, Mike Carey <dtab...@gmail.com> wrote: > > Note that upserts don't avoid searches (Still need to get the old record > to update secondary indexes from.) > > > On 5/8/17 12:10 PM, Jianfeng Jia wrote: >> Aha, never knew that before. We will definitely try upsert feed next time! >> Thanks for pointing it out! >> >>> On May 8, 2017, at 12:07 PM, Ildar Absalyamov <ildar.absalya...@gmail.com> >>> wrote: >>> >>> I believe we already support upsert feeds ;) >>> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql >>> >>> <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql> >>>> On May 8, 2017, at 12:04, Jianfeng Jia <jianfeng@gmail.com> wrote: >>>> >>>> I also observe this getting slower problem every-time when we re-ingest >>>> the twitter data. One difference is that the duplicate key could happen, >>>> and we know that is indeed duplicate record. To skip the search, we would >>>> expect an “upsert” logic ( just replace the old one :-) ) instead of an >>>> insert. >>>> >>>> Then maybe we can add some configuration in feed configuration like >>>> >>>> create feed MessageFeed using localfs( >>>> ("format"="adm"), >>>> ("type-name"="typeX"), >>>> ("upsert"="true") >>>> ); >>>> >>>> to indicate that this feed using the upsert logic instead of insert. >>>> >>>> One thing we need to confirm is that if “upsert” is actually implemented >>>> in a no-search fashion? >>>> Based on the way we searching the components, only the most recent one >>>> will be popped out. Then blindly insert should be OK logically. Correct me >>>> if I missed some other cases (highly likely :-)). >>>> >>>> >>>>> On May 8, 2017, at 11:05 AM, Mike Carey <dtab...@gmail.com> wrote: >>>>> >>>>> +0.99 from me. >>>>> >>>>> >>>>> On 5/8/17 9:50 AM, Taewoo Kim wrote: >>>>>> +1 for auto-generated ID case >>>>>> >>>>>> Best, >>>>>> Taewoo >>>>>> >>>>>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <buyin...@gmail.com> wrote: >>>>>> >>>>>>> Abdullah has a pending change that disables searches if there's no >>>>>>> secondary indexes [1]. >>>>>>> Auto-generated ID could be another case for which we can disable >>>>>>> searches >>>>>>> as well. >>>>>>> >>>>>>> Best, >>>>>>> Yingyi >>>>>>> >>>>>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/ >>>>>>> >>>>>>> >>>>>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <wael@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Devs, >>>>>>>> >>>>>>>> I'm noticing a behavior during the ingestion is that it's getting >>>>>>>> slower >>>>>>> by >>>>>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm >>>>>>>> seeing is that I can notice the drop in ingestion rate roughly after >>>>>>> having >>>>>>>> 10 components (around ~13 GB). That's what I'm not sure if it's >>>>>>>> expected? >>>>>>>> >>>>>>>> I tried multiple setups (increasing Memory component size + >>>>>>>> max-mergable-component-size). All of which delayed the problem but not >>>>>>>> solved it. The only part I've never changed is the bloom-filter >>>>>>>> false-positive rate (1%). Which I want to investigate next. >>>>>>>> >>>>>>>> So.. >>>>>>>> What I want to suggest is that when the primary key is auto-generated, >>>>>>> why >>>>>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me. >>>>>>> Also, >>>>>>>> can we give the user the ability to tell the index that all keys are >>>>>>> unique >>>>>>>> ? I know I should not trust the user .. but in certain cases, probably >>>>>>> the >>>>>>>> user is certain that the key is unique. Or a more elegant solution can >>>>>>>> shine in the end :-) >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *Regards,* >>>>>>>> Wail Alkowaileet >>>>>>>> >>> Best regards, >>> Ildar >>> >
Re: Searching for duplicates during feed ingestion.
Aha, never knew that before. We will definitely try upsert feed next time! Thanks for pointing it out! > On May 8, 2017, at 12:07 PM, Ildar Absalyamov <ildar.absalya...@gmail.com> > wrote: > > I believe we already support upsert feeds ;) > https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql > > <https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/runtimets/queries/feeds/upsert-feed/upsert-feed.1.ddl.aql> >> On May 8, 2017, at 12:04, Jianfeng Jia <jianfeng@gmail.com> wrote: >> >> I also observe this getting slower problem every-time when we re-ingest the >> twitter data. One difference is that the duplicate key could happen, and we >> know that is indeed duplicate record. To skip the search, we would expect an >> “upsert” logic ( just replace the old one :-) ) instead of an insert. >> >> Then maybe we can add some configuration in feed configuration like >> >> create feed MessageFeed using localfs( >> ("format"="adm"), >> ("type-name"="typeX"), >> ("upsert"="true") >> ); >> >> to indicate that this feed using the upsert logic instead of insert. >> >> One thing we need to confirm is that if “upsert” is actually implemented in >> a no-search fashion? >> Based on the way we searching the components, only the most recent one will >> be popped out. Then blindly insert should be OK logically. Correct me if I >> missed some other cases (highly likely :-)). >> >> >>> On May 8, 2017, at 11:05 AM, Mike Carey <dtab...@gmail.com> wrote: >>> >>> +0.99 from me. >>> >>> >>> On 5/8/17 9:50 AM, Taewoo Kim wrote: >>>> +1 for auto-generated ID case >>>> >>>> Best, >>>> Taewoo >>>> >>>> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu <buyin...@gmail.com> wrote: >>>> >>>>> Abdullah has a pending change that disables searches if there's no >>>>> secondary indexes [1]. >>>>> Auto-generated ID could be another case for which we can disable searches >>>>> as well. >>>>> >>>>> Best, >>>>> Yingyi >>>>> >>>>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/ >>>>> >>>>> >>>>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet <wael@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Devs, >>>>>> >>>>>> I'm noticing a behavior during the ingestion is that it's getting slower >>>>> by >>>>>> time. I know that is an expected behavior in LSM-indexes. But what I'm >>>>>> seeing is that I can notice the drop in ingestion rate roughly after >>>>> having >>>>>> 10 components (around ~13 GB). That's what I'm not sure if it's expected? >>>>>> >>>>>> I tried multiple setups (increasing Memory component size + >>>>>> max-mergable-component-size). All of which delayed the problem but not >>>>>> solved it. The only part I've never changed is the bloom-filter >>>>>> false-positive rate (1%). Which I want to investigate next. >>>>>> >>>>>> So.. >>>>>> What I want to suggest is that when the primary key is auto-generated, >>>>> why >>>>>> AsterixDB looks for duplicates? it seems a wasteful operation to me. >>>>> Also, >>>>>> can we give the user the ability to tell the index that all keys are >>>>> unique >>>>>> ? I know I should not trust the user .. but in certain cases, probably >>>>> the >>>>>> user is certain that the key is unique. Or a more elegant solution can >>>>>> shine in the end :-) >>>>>> >>>>>> -- >>>>>> >>>>>> *Regards,* >>>>>> Wail Alkowaileet >>>>>> >>> >> > > Best regards, > Ildar >
Re: Searching for duplicates during feed ingestion.
I also observe this getting slower problem every-time when we re-ingest the twitter data. One difference is that the duplicate key could happen, and we know that is indeed duplicate record. To skip the search, we would expect an “upsert” logic ( just replace the old one :-) ) instead of an insert. Then maybe we can add some configuration in feed configuration like create feed MessageFeed using localfs( ("format"="adm"), ("type-name"="typeX"), ("upsert"="true") ); to indicate that this feed using the upsert logic instead of insert. One thing we need to confirm is that if “upsert” is actually implemented in a no-search fashion? Based on the way we searching the components, only the most recent one will be popped out. Then blindly insert should be OK logically. Correct me if I missed some other cases (highly likely :-)). > On May 8, 2017, at 11:05 AM, Mike Careywrote: > > +0.99 from me. > > > On 5/8/17 9:50 AM, Taewoo Kim wrote: >> +1 for auto-generated ID case >> >> Best, >> Taewoo >> >> On Mon, May 8, 2017 at 8:57 AM, Yingyi Bu wrote: >> >>> Abdullah has a pending change that disables searches if there's no >>> secondary indexes [1]. >>> Auto-generated ID could be another case for which we can disable searches >>> as well. >>> >>> Best, >>> Yingyi >>> >>> [1] https://asterix-gerrit.ics.uci.edu/#/c/1711/ >>> >>> >>> On Mon, May 8, 2017 at 4:30 AM, Wail Alkowaileet >>> wrote: >>> Hi Devs, I'm noticing a behavior during the ingestion is that it's getting slower >>> by time. I know that is an expected behavior in LSM-indexes. But what I'm seeing is that I can notice the drop in ingestion rate roughly after >>> having 10 components (around ~13 GB). That's what I'm not sure if it's expected? I tried multiple setups (increasing Memory component size + max-mergable-component-size). All of which delayed the problem but not solved it. The only part I've never changed is the bloom-filter false-positive rate (1%). Which I want to investigate next. So.. What I want to suggest is that when the primary key is auto-generated, >>> why AsterixDB looks for duplicates? it seems a wasteful operation to me. >>> Also, can we give the user the ability to tell the index that all keys are >>> unique ? I know I should not trust the user .. but in certain cases, probably >>> the user is certain that the key is unique. Or a more elegant solution can shine in the end :-) -- *Regards,* Wail Alkowaileet >
Re: What is the new path to check Hyracks jobs status in AsterixDB?
That’s a good idea. I haven’t think about the browser plugin. Now it looks better! > On Apr 7, 2017, at 4:34 PM, Till Westmann <ti...@apache.org> wrote: > > Since the endpoints return JSON, using a JSON formatter plugin for the > browser seems easier. > Otherwise I think that we’ll need to create a page around it (which is > clearly feasible as well). > > On 7 Apr 2017, at 16:21, Mike Carey wrote: > >> Could we use the same library that Xikui used for JSON (formatted) as a baby >> step? >> >> >> On 4/7/17 10:04 AM, Jianfeng Jia wrote: >>> Got it. (do we have any plan to beautify the UI? :-) >>> Thanks! >>> >>>> On Apr 7, 2017, at 9:46 AM, Yingyi Bu <buyin...@gmail.com> wrote: >>>> >>>> Hi Jianfeng, >>>> >>>> The admin console has been removed but the REST APIs which return JSON >>>> results are still there. >>>> >>>> Let's take the sample cluster as an example. >>>> To check nodes: >>>> http://localhost:16001/rest/nodes/ >>>> http://localhost:16001/rest/nodes/red >>>> http://localhost:16001/rest/nodes/blue >>>> >>>> To check jobs: >>>> http://localhost:16001/rest/jobs/ >>>> http://localhost:16001/rest/jobs/JID:0/job-run >>>> >>>> Best, >>>> Yingyi >>>> >>>> >>>> On Thu, Apr 6, 2017 at 5:18 PM, Jianfeng Jia <jianfeng@gmail.com> >>>> wrote: >>>> >>>>> Dear Devs, >>>>> >>>>> We used to have a Hyracks adminconsole web page xxx:/adminconsole (or >>>>> on 16001 port if not using managix) which can watch the details of the >>>>> recent jobs. By click into each job we can know Activity Cluster Graph/Job >>>>> Timeline etc. >>>>> >>>>> It’s very useful to have a overview of the current system workload (e.g., >>>>> how many queries are running, when did they submit, how long it has ran >>>>> …). >>>>> Right now, the same link returns a following error. >>>>> page can’t be found >>>>> >>>>> I’m wondering what is the new path to get the same information? Thanks! >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> Jianfeng Jia >>>>> PhD Candidate of Computer Science >>>>> University of California, Irvine >>>>> >>>>>
Re: What is the new path to check Hyracks jobs status in AsterixDB?
Got it. (do we have any plan to beautify the UI? :-) Thanks! > On Apr 7, 2017, at 9:46 AM, Yingyi Bu <buyin...@gmail.com> wrote: > > Hi Jianfeng, > > The admin console has been removed but the REST APIs which return JSON > results are still there. > > Let's take the sample cluster as an example. > To check nodes: > http://localhost:16001/rest/nodes/ > http://localhost:16001/rest/nodes/red > http://localhost:16001/rest/nodes/blue > > To check jobs: > http://localhost:16001/rest/jobs/ > http://localhost:16001/rest/jobs/JID:0/job-run > > Best, > Yingyi > > > On Thu, Apr 6, 2017 at 5:18 PM, Jianfeng Jia <jianfeng@gmail.com> wrote: > >> Dear Devs, >> >> We used to have a Hyracks adminconsole web page xxx:/adminconsole (or >> on 16001 port if not using managix) which can watch the details of the >> recent jobs. By click into each job we can know Activity Cluster Graph/Job >> Timeline etc. >> >> It’s very useful to have a overview of the current system workload (e.g., >> how many queries are running, when did they submit, how long it has ran …). >> Right now, the same link returns a following error. >> page can’t be found >> >> I’m wondering what is the new path to get the same information? Thanks! >> >> >> >> Best, >> >> Jianfeng Jia >> PhD Candidate of Computer Science >> University of California, Irvine >> >>
What is the new path to check Hyracks jobs status in AsterixDB?
Dear Devs, We used to have a Hyracks adminconsole web page xxx:/adminconsole (or on 16001 port if not using managix) which can watch the details of the recent jobs. By click into each job we can know Activity Cluster Graph/Job Timeline etc. It’s very useful to have a overview of the current system workload (e.g., how many queries are running, when did they submit, how long it has ran …). Right now, the same link returns a following error. page can’t be found I’m wondering what is the new path to get the same information? Thanks! Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
Re: Add the Java driver for Asterix
Hi Jabbar, It’s nice that you are interested in the project. @devs, here is the issue I posted for GSOC2016 https://issues.apache.org/jira/browse/ASTERIXDB-1369 <https://issues.apache.org/jira/browse/ASTERIXDB-1369> , and I manually updated to GSOC2017. I feel the need, but I’m not an expert on JDBC. Do you think it’s a good project? and will anyone interested in give the detailed instruction on it? Otherwise, I will close this issue to give it a very low priority. > On Mar 3, 2017, at 10:19 AM, jabbar memon <memonjabb...@gmail.com> wrote: > > Hi Jianfeng Jia, > i am postgraduate at Dhirubhai Ambani Institute of Information and > Communication technology,Ahmedabad.I'd like to contribute to this project > in the GSOC 2017. I have a good knowledge about Java. I'd like to know more > about this project.And it is new for me so it will be great challenging and > excitement. > > Thanks > Jabbar Memon
Re: Choosing defaults for AsterixDB
Hi, I want to pick up this thread to verify if the AQL will still be supported in the future? Currently, Cloudberry automatically translates the JSON request to AQL statements. It will be a hard work to switch to SQL++. I’m not object to set the default option to SQL++. However, we will keep the support for AQL, right? (especially in the RESTFull API). > On Jan 10, 2017, at 5:47 PM, Till Westmannwrote: > > Ok, since there’s a lot of agreement and no concerns, I’ll go ahead. > > Thanks, > Till > > On 10 Jan 2017, at 9:22, Yingyi Bu wrote: > >> +100! >> >> On Tue, Jan 10, 2017 at 9:17 AM, Mike Carey wrote: >> >>> +1 from me too for SQL++ and clean JSON. >>> >>> >>> >>> On 1/10/17 8:25 AM, Murtadha Hubail wrote: >>> +1 to SQL++ and clean JSON. Cheers, Murtadha On Jan 10, 2017, at 9:46 AM, Till Westmann wrote: > > Hi, > > as you know AsterixDB supports 2 query languages (AQL and SQL++) and many > output formats (ADM, clean JSON, lossless JSON, CSV). Our current > defaults > for these options (at least on the web interface) are AQL and ADM. > > I’d like to propose to change those defaults to be SQL++ and (clean) > JSON. > The reason for wanting them to change, is that I think that these choices > are more attractive to new users of the system and thus can help to > increase > the adoption of AsterixDB. A user with some database experience is much > more > likely to have previous SQL experience and to feel at home with SQL++ > than > having XQuery experience and feeling at home with AQL. Similarly, most > users > will want to use the data that they get out of AsterixDB in an > application > and it will be a lot easier to consume JSON than it is to consume ADM. > > I've prepared a (tiny) change to change the defaults [1] and I'm > wondering > if there are concerns that should keep us from making this change. > > Cheers, > Till > > [1] https://asterix-gerrit.ics.uci.edu/#/c/1409/ > >>>
Re: [VOTE] Release Apache AsterixDB 0.9.0 and Hyracks 0.3.0 (RC2)
+1 - signatures and hash checks - source compilation works - nc service works > On Jan 21, 2017, at 6:03 PM, Yingyi Buwrote: > > +1 > > - signatures and hashes of all 5 archives ok > - nc service binary works > - version api agrees with the commit id on ASF repo > - source compilation works > > Best, > Yingyi > > > On Sat, Jan 21, 2017 at 9:16 AM, Steven Jacobs wrote: > >> +1 >> Steven >> >> On Sat, Jan 21, 2017 at 7:36 AM Till Westmann wrote: >> >>> +1 >>> >>> >>> >>> - signature and hashes of all 5 archives ok >>> >>> - source archives agree with commit ids >>> >>> - LICENSE + NOTICE look good for all archives >>> >>> - source files have headers >>> >>> - no unexpected binaries >>> >>> - compilation works >>> >>> >>> >>> Till >>> >>> >>> >>> On 19 Jan 2017, at 4:50, Ian Maxon wrote: >>> >>> >>> Hi again everyone, >>> >>> Please verify and vote on the first non-incubating Apache AsterixDB >>> Release! >>> This 2nd RC addresses build issues noticed in the previous RC, along >>> with >>> some minor license tweaks. >>> This release utilizes a series of improvements around the actual >>> release >>> process that will hopefully shorten the interval between releases. A >>> further email detailing the features contained in this release as >>> compared >>> to the previous incubating release will be forthcoming once a suitable >>> RC >>> passes voting. >>> >>> The tags to be voted on are: >>> >>> apache-asterixdb-0.9.0-rc2 >>> commit: 4383bdde78c02d597be65ecf467c5a7df85a2055 >>> link: >>> >>> https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;a= >> tag;h=refs/tags/apache-asterixdb-0.9.0-rc2 >>> >>> and >>> >>> apache-hyracks-0.3.0-rc2 >>> commit: def643d586b62b2616b8ab8e6fc3ba598cf5ad67 >>> link: >>> >>> https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;a= >> tag;h=refs/tags/apache-hyracks-0.3.0-rc2 >>> >>> The artifacts, sha1's, and signatures are (for each artifact), are at: >>> >>> AsterixDB Source >>> https://dist.apache.org/repos/dist/dev/asterixdb/apache- >>> asterixdb-0.9.0-source-release.zip >>> https://dist.apache.org/repos/dist/dev/asterixdb/apache- >>> asterixdb-0.9.0-source-release.zip.asc >>> https://dist.apache.org/repos/dist/dev/asterixdb/apache- >>> asterixdb-0.9.0-source-release.zip.sha1 >>> >>> SHA1: 49f8df822c6273a310027d3257a79afb45c8d446 >>> >>> Hyracks Source >>> https://dist.apache.org/repos/dist/dev/asterixdb/apache- >>> hyracks-0.3.0-source-release.zip >>> https://dist.apache.org/repos/dist/dev/asterixdb/apache- >>> hyracks-0.3.0-source-release.zip.asc >>> https://dist.apache.org/repos/dist/dev/asterixdb/apache- >>> hyracks-0.3.0-source-release.zip.sha1 >>> >>> SHA1: 4d042cab164347f0cc5cc1cfb3da8d4f02eea1de >>> >>> AsterixDB NCService Installer: >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> server-0.9.0-binary-assembly.zip >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> server-0.9.0-binary-assembly.zip.asc >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> server-0.9.0-binary-assembly.zip.sha1 >>> >>> SHA1: 46c4cc3dc09e915d4b1bc6f912faef389488fdb6 >>> >>> AsterixDB Managix Installer >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> installer-0.9.0-binary-assembly.zip >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> installer-0.9.0-binary-assembly.zip.asc >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> installer-0.9.0-binary-assembly.zip.sha1 >>> >>> SHA1: 41497dbadb0ad281ba0a10ee87eaa5f7afa78cef >>> >>> AsterixDB YARN Installer >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> yarn-0.9.0-binary-assembly.zip >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> yarn-0.9.0-binary-assembly.zip.asc >>> https://dist.apache.org/repos/dist/dev/asterixdb/asterix- >>> yarn-0.9.0-binary-assembly.zip.sha1 >>> >>> SHA1: 3ade0d2957e7f3e465e357aced6712ef72598613 >>> >>> Additionally, a staged maven repository is available at: >>> >>> >>> https://repository.apache.org/content/repositories/ >> orgapacheasterix-1024/ >>> >>> The KEYS file containing the PGP keys used to sign the release can be >>> found at >>> >>> https://dist.apache.org/repos/dist/release/asterixdb/KEYS >>> >>> RAT was executed as part of Maven via the RAT maven plugin, but >>> excludes files that are: >>> >>> - data for tests
Re: How to set the lsm component size?
Nice! > On Oct 13, 2016, at 2:56 PM, Taewoo Kim <wangs...@gmail.com> wrote: > > The explanation changes for the two parameters have been merged into the > master. > > https://asterix-gerrit.ics.uci.edu/#/c/1281/3/asterixdb/asterix-installer/src/main/resources/conf/asterix-configuration.xml > > Best, > Taewoo > > On Mon, Sep 12, 2016 at 5:02 PM, Taewoo Kim <wangs...@gmail.com> wrote: > >> Thanks to Sattam, here is the revised version. Feel free to revise this. I >> will upload a patch set after some revision is done. >> >> *storage.memorycomponent.numpages* >> >> The number of pages to allocate for a memory component. (Default = 256) >> This budget is shared by all the memory components of the primary index >> and all its secondary indexes across all I/O devices on a node. >> Note: in-memory components usually has fill factor of 75% since the pages >> are 75% full and the remaining 25% is un-utilized. >> >> >> *storage.memorycomponent.globalbudget* >> >> [4GB + 100MB] The total size of memory in bytes that the sum of all open >> memory components cannot exceed. (Default = 512MB) >> Consider this as the buffer cache for all memory components of all indexes >> in a node. >> When this budget is fully used, a victim dataset will be chosen. It must >> be evicted and closed to make a space for another dataset. >> >> >> Best, >> Taewoo >> >> On Mon, Sep 12, 2016 at 4:10 PM, Mike Carey <dtab...@gmail.com> wrote: >> >>> +1 >>> >>> >>> >>> On 9/12/16 3:42 PM, Taewoo Kim wrote: >>> >>>> It would be really helpful this conversation can be applied in the >>>> description of each parameter. Currently, I think that is too short. >>>> >>>> Best, >>>> Taewoo >>>> >>>> On Mon, Sep 12, 2016 at 2:19 PM, Jianfeng Jia <jianfeng@gmail.com> >>>> wrote: >>>> >>>> Clear. Thanks. >>>>> >>>>> And Ian’s parameters works. I can have a on-disk components around 128M. >>>>> Thanks! >>>>> >>>>> On Sep 12, 2016, at 12:50 PM, Sattam Alsubaiee <salsuba...@gmail.com> >>>>>> >>>>> wrote: >>>>> >>>>>> This is the total memory size given for all datasets. Think of it as >>>>>> the >>>>>> buffer cache for all memory components of all indexes in that machine. >>>>>> >>>>> When >>>>> >>>>>> it is exhausted, a victim dataset must be evicted and closed to have a >>>>>> space for another dataset. >>>>>> >>>>>> On Mon, Sep 12, 2016 at 12:29 PM, Jianfeng Jia <jianfeng@gmail.com >>>>>>> >>>>>> wrote: >>>>>> >>>>>> I was a little confused, there is another configuration: >>>>>>> >>>>>>> storage.memorycomponent.globalbudget ( which I set to 4G) >>>>>>> >>>>>>> I was thinking this is the budget that every component on one >>>>>>> partition >>>>>>> >>>>>> is >>>>> >>>>>> shared. Is that the case? >>>>>>> >>>>>>> On Sep 12, 2016, at 12:16 PM, Sattam Alsubaiee <salsuba...@gmail.com> >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> The 128M is shared by all the memory components of the primary index >>>>>>>> >>>>>>> and >>>>> >>>>>> all its secondary indexes across all io devices on that node. >>>>>>>> Also the in-memory components usually usually has fill factor of 75% >>>>>>>> >>>>>>> since >>>>>>> >>>>>>>> the pages are 75% full and the remaining 25% is un-utilized. >>>>>>>> >>>>>>>> The page size that you have set 128KB looks reasonable for most >>>>>>>> cases. >>>>>>>> >>>>>>> Your >>>>>>> >>>>>>>> best bet is to increase the value of storage.memorycomponent.numpage >>>>>>>> >>>>>>> to >>>>> >>>>>> a >&g
Re: How to set the lsm component size?
Clear. Thanks. And Ian’s parameters works. I can have a on-disk components around 128M. Thanks! > On Sep 12, 2016, at 12:50 PM, Sattam Alsubaiee <salsuba...@gmail.com> wrote: > > This is the total memory size given for all datasets. Think of it as the > buffer cache for all memory components of all indexes in that machine. When > it is exhausted, a victim dataset must be evicted and closed to have a > space for another dataset. > > On Mon, Sep 12, 2016 at 12:29 PM, Jianfeng Jia <jianfeng@gmail.com> > wrote: > >> I was a little confused, there is another configuration: >> >> storage.memorycomponent.globalbudget ( which I set to 4G) >> >> I was thinking this is the budget that every component on one partition is >> shared. Is that the case? >> >>> On Sep 12, 2016, at 12:16 PM, Sattam Alsubaiee <salsuba...@gmail.com> >> wrote: >>> >>> The 128M is shared by all the memory components of the primary index and >>> all its secondary indexes across all io devices on that node. >>> Also the in-memory components usually usually has fill factor of 75% >> since >>> the pages are 75% full and the remaining 25% is un-utilized. >>> >>> The page size that you have set 128KB looks reasonable for most cases. >> Your >>> best bet is to increase the value of storage.memorycomponent.numpage to >> a >>> higher number. >>> >>> Sattam >>> >>> >>> On Mon, Sep 12, 2016 at 11:33 AM, Jianfeng Jia <jianfeng@gmail.com> >>> wrote: >>> >>>> Dear devs, >>>> >>>> I’m using the `no-merge` compaction policy and find that the physical >>>> flushed on-disk component is smaller than I was expected. >>>> >>>> Here are my related configurations >>>> >>>> >>>> storage.memorycomponent.pagesize >>>> 128KB >>>> The page size in bytes for pages allocated to memory >>>> components. (Default = "131072" // 128KB) >>>> >>>> >>>> >>>> >>>> storage.memorycomponent.numpages >>>> 1024 >>>> The number of pages to allocate for a memory component. >>>> (Default = 256) >>>> >>>> >>>> >>>> With these two settings, I’m expecting the lsm component should be 128M. >>>> However, the flushed one is about 16M~ 20M. Do we have some compression >> for >>>> the on-disk components? If so, it will be good. Otherwise, could someone >>>> help me to increase the component size? Thanks! >>>> >>>> Best, >>>> >>>> Jianfeng Jia >>>> PhD Candidate of Computer Science >>>> University of California, Irvine >>>> >>>> >> >> >> >> Best, >> >> Jianfeng Jia >> PhD Candidate of Computer Science >> University of California, Irvine >> >> Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
Re: How to set the lsm component size?
I was a little confused, there is another configuration: storage.memorycomponent.globalbudget ( which I set to 4G) I was thinking this is the budget that every component on one partition is shared. Is that the case? > On Sep 12, 2016, at 12:16 PM, Sattam Alsubaiee <salsuba...@gmail.com> wrote: > > The 128M is shared by all the memory components of the primary index and > all its secondary indexes across all io devices on that node. > Also the in-memory components usually usually has fill factor of 75% since > the pages are 75% full and the remaining 25% is un-utilized. > > The page size that you have set 128KB looks reasonable for most cases. Your > best bet is to increase the value of storage.memorycomponent.numpage to a > higher number. > > Sattam > > > On Mon, Sep 12, 2016 at 11:33 AM, Jianfeng Jia <jianfeng@gmail.com> > wrote: > >> Dear devs, >> >> I’m using the `no-merge` compaction policy and find that the physical >> flushed on-disk component is smaller than I was expected. >> >> Here are my related configurations >> >> >>storage.memorycomponent.pagesize >>128KB >>The page size in bytes for pages allocated to memory >> components. (Default = "131072" // 128KB) >> >> >> >> >>storage.memorycomponent.numpages >>1024 >>The number of pages to allocate for a memory component. >> (Default = 256) >> >> >> >> With these two settings, I’m expecting the lsm component should be 128M. >> However, the flushed one is about 16M~ 20M. Do we have some compression for >> the on-disk components? If so, it will be good. Otherwise, could someone >> help me to increase the component size? Thanks! >> >> Best, >> >> Jianfeng Jia >> PhD Candidate of Computer Science >> University of California, Irvine >> >> Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
How to set the lsm component size?
Dear devs, I’m using the `no-merge` compaction policy and find that the physical flushed on-disk component is smaller than I was expected. Here are my related configurations storage.memorycomponent.pagesize 128KB The page size in bytes for pages allocated to memory components. (Default = "131072" // 128KB) storage.memorycomponent.numpages 1024 The number of pages to allocate for a memory component. (Default = 256) With these two settings, I’m expecting the lsm component should be 128M. However, the flushed one is about 16M~ 20M. Do we have some compression for the on-disk components? If so, it will be good. Otherwise, could someone help me to increase the component size? Thanks! Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
Re: Creating RTree: no space left
I just tried and found there is not “plan” got printed for creating index. It seems just an operator. As Taewoo said, we need to check why Rtree creation generate so many runs. > On Aug 26, 2016, at 3:57 PM, Wail Alkowaileet <wael@gmail.com> wrote: > > @Jianfeng: Sorry for the stupid questio. But it seems that the logs and the > WebUI does not show the plan. Is there a flag for that? > > @Taewoo: I'll look into it and see what's going on. AFAIK, the comparator > is Hilbert. > > On Fri, Aug 26, 2016 at 7:55 PM, Taewoo Kim <wangs...@gmail.com> wrote: > >> Based on a rough calculation, per partition, each point field takes 3.6GB >> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are >> generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned >> that there was no issue when creating a B+ tree index, we need to check >> what SORT process is required by R-Tree index. >> >> Best, >> Taewoo >> >> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <jianfeng@gmail.com> >> wrote: >> >>> If all of the file names start with “ExternalSortRunGenerator”, then they >>> are the first round files which can not be GCed. >>> Could you provide the query plan as well? >>> >>>> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <wael@gmail.com> >>> wrote: >>>> >>>> Hi Ian and Pouria, >>>> >>>> The name of the files along with the sizes (there were 625 one of those >>>> before crashing): >>>> >>>> sizename >>>> 96MB ExternalSortRunGenerator8917133039835449370.waf >>>> 128MB ExternalSortRunGenerator8948724728025392343.waf >>>> >>>> no files were generated beyond runs. >>>> compiler.sortmemory = 64MB >>>> >>>> Here is the full logs >>>> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_ >>> 25_07%3A34%3A52_AST_2016.zip?dl=0> >>>> >>>> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh < >>> pouria.pirza...@gmail.com> >>>> wrote: >>>> >>>>> We previously had issues with huge spilled sort temp files when >> creating >>>>> inverted index for fuzzy queries, but NOT R-Trees. >>>>> I also recall that Yingyi fixed the issue of delaying clean-up for >>>>> intermediate temp files until the end of the query execution. >>>>> If you can share names of a couple of temp files (and their sizes >> along >>>>> with the sort memory setting you have in asterix-configuration.xml) we >>> may >>>>> be able to have a better guess as if the sort is really going into a >>>>> two-level merge or not. >>>>> >>>>> Pouria >>>>> >>>>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <ima...@uci.edu> wrote: >>>>> >>>>>> I think that execption ("No space left on device") is just casted >> from >>>>> the >>>>>> native IOException. Therefore I would be inclined to believe it's >>>>> genuinely >>>>>> out of space. I suppose the question is why the external sort is so >>> huge. >>>>>> What is the query plan? Maybe that will shed light on a possible >> cause. >>>>>> >>>>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet < >> wael@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I was monitoring Inodes ... it didn't go beyond 1%. >>>>>>> >>>>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet < >> wael@gmail.com >>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Chris and Mike, >>>>>>>> >>>>>>>> Actually I was monitoring it to see what's going on: >>>>>>>> >>>>>>>> - The size of each partition is about 40GB (80GB in total per >>>>>>>> iodevice). >>>>>>>> - The runs took 157GB per iodevice (about 2x of the dataset >> size). >>>>>>>> Each run takes either of 128MB or 96MB of storage. >>>>>>>> - At a certain time, there were 522 runs. >>>>>>>> >>>>>>>> I even tried to create a BTree Index to see if that happens as >> well. >>>>&g
Re: Creating RTree: no space left
all >>>> files, >>>>>>> but worth checking. >>>>>>> >>>>>>> If that's not it, then can you share the full exception and stack >>>> trace? >>>>>>> >>>>>>> Ceej >>>>>>> aka Chris Hillery >>>>>>> >>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet < >>> wael@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> I just cleared the hard drives to get 80% free space. I still get >> the >>>>>>>> same >>>>>>>> issue. >>>>>>>> >>>>>>>> The data contains: >>>>>>>> 1- 2887453794 records. >>>>>>>> 2- Schema: >>>>>>>> >>>>>>>> create type CDRType as { >>>>>>>> >>>>>>>> id:uuid, >>>>>>>> >>>>>>>> 'date':string, >>>>>>>> >>>>>>>> 'time':string, >>>>>>>> >>>>>>>> 'duration':int64, >>>>>>>> >>>>>>>> 'caller':int64, >>>>>>>> >>>>>>>> 'callee':int64, >>>>>>>> >>>>>>>> location:point? >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet < >>> wael@gmail.com >>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Dears, >>>>>>>>> >>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which >> has >>>>>>>>> >>>>>>>> 2x500GB >>>>>>>> >>>>>>>>> SSD. >>>>>>>>> >>>>>>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e >>> the >>>>>>>>> total is 4 iodevices per NC). After loading the data, each >> Asterix >>>>>>>>> partition occupied 31GB. >>>>>>>>> >>>>>>>>> The cluster has about 50% free space in each hard drive >>>> (approximately >>>>>>>>> about 250GB free space in each hard drive). However, when I tried >>> to >>>>>>>>> >>>>>>>> create >>>>>>>> >>>>>>>>> an index of type RTree, I got an exception that no space left in >>> the >>>>>>>>> hard >>>>>>>>> drive during the External Sort phase. >>>>>>>>> >>>>>>>>> Is that normal ? >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> *Regards,* >>>>>>>>> Wail Alkowaileet >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *Regards,* >>>>>>>> Wail Alkowaileet >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> *Regards,* >>>>> Wail Alkowaileet >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *Regards,* >>>> Wail Alkowaileet >>>> >>> >> > > > > -- > > *Regards,* > Wail Alkowaileet Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
Re: Wishlist
Yes, it doesn’t seem right. I’ve move it to the higher level. > On Jun 18, 2016, at 4:53 PM, Till Westmann <t...@westmann.org> wrote: > > Hi Jianfeng, > > it seems to me that the "wish list" [1] you started is not really a "design > doc". > Should we pull it up one level? > > Cheers, > Till > > [1] > https://cwiki.apache.org/confluence/display/ASTERIXDB/Things+will+be+easier+if+AsterixDB+have+these+features Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine
Remove the apache.org filter when using YourKit
Dear devs, Don’t know if any of you hit the same problem as I did. When I use Yourkit to profile the system, it only showed that one method java.thread.run took 99% usage without further details , while using jvisualvm could give correct activity hotspot yet it’s very hard to set remotely. It is because the default filter in Yourkit skip the “org.apache” packages :-) To change the filter: click “settings” -> “filters” and click out the “org.apache”. Enjoy profiling apache code then! Best, Jianfeng Jia PhD Candidate of Computer Science University of California, Irvine