Re: [VOTE] Release Apache AsterixDB 0.9.5 and Hyracks 0.3.5 (RC3)

2019-09-13 Thread Chen Li
+1

On Thu, Sep 12, 2019 at 4:50 PM Mike Carey  wrote:

> +1
>
> - Successfully did NCService install and ran through the SQL++ 101
> exercises
>
> On 9/12/19 3:47 PM, Wail Alkowaileet wrote:
> > +1
> >
> > - Signatures and hashes ok.
> > - NCService binary works.
> > - Source compilation works.
> > - Executed the sample cluster. Ingested tweets and run few queries.
> >
> > On Tue, Sep 3, 2019 at 6:02 PM Ian Maxon  wrote:
> >
> >> Hi everyone,
> >>
> >> Please verify and vote on the latest release of Apache AsterixDB. This
> >> candidate fixes the binary name and missing Netty notice from RC2.
> >>
> >> The change that produced this release and the change to advance the
> >> version are
> >> up for review on Gerrit:
> >>
> >>
> >>
> https://asterix-gerrit.ics.uci.edu/#/q/status:open+owner:%22Jenkins+%253Cjenkins%2540fulliautomatix.ics.uci.edu%253E%22
> >>
> >> The release artifacts are as follows:
> >>
> >> AsterixDB Source
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip.asc
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip.sha256
> >>
> >> SHA256:be41051e803e5ada2c64f608614c6476c6686e043c47a2a0291ccfd25239a679
> >>
> >> Hyracks Source
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip.asc
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip.sha256
> >>
> >> SHA256:b06fe983aa6837abe3460a157d7600662ec56181a43db317579f5c7ddf9bfc08
> >>
> >> AsterixDB NCService Installer:
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5.zip
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5.zip.asc
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5.zip.sha256
> >>
> >>
> >> SHA256:
> >>
> >> The KEYS file containing the PGP keys used to sign the release can be
> >> found at
> >>
> >> https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> >>
> >> RAT was executed as part of Maven via the RAT maven plugin, but
> >> excludes files that are:
> >>
> >> - data for tests
> >> - procedurally generated,
> >> - or source files which come without a header mentioning their license,
> >>but have an explicit reference in the LICENSE file.
> >>
> >>
> >> The vote is open for 72 hours, or until the necessary number of votes
> >> (3 +1) has been reached.
> >>
> >> Please vote
> >> [ ] +1 release these packages as Apache AsterixDB 0.9.5 and
> >> Apache Hyracks 0.3.5
> >> [ ] 0 No strong feeling either way
> >> [ ] -1 do not release one or both packages because ...
> >>
> >> Thanks!
> >>
> >
>


Re: what parallel DBMS is AsterixDB compared against?

2019-08-07 Thread Chen Li
About one year ago some students in China did an AsterixDB/Greenplum and
wrote a report in Chinese.  If needed, I can contact them to get the report
again, since the old one was not kept.

Chen

On Wed, Aug 7, 2019 at 2:44 PM Michael Carey  wrote:

> You should be fine:  "The Greenplum project is released under the Apache
> 2 license." (quoting from their github repo)  :-)
>
> This sounds like an interesting undertaking - keep us posted and feel
> free to come here for support/Q's/tuning thoughts/etc.  You should be
> able to grab a copy of BigFUN and its data generator if using it is of
> interest; exactly where to find those will be listed in the BigFUN paper
> references.
>
> Cheers,
>
> Mike
>
> On 8/7/19 2:35 PM, Karl Pietrzak wrote:
> > Thanks, Michael!  I'm looking to compare AsterixDB against Greenplum
> > and post the results.  Anyone know if Greenplum has the "DeWitt clause"?
> >
> > On Wed, Aug 7, 2019 at 4:20 PM Michael Carey  > > wrote:
> >
> > (Meant to reply to the list!)
> >
> >
> >  Forwarded Message 
> > Subject:  Re: what parallel DBMS is AsterixDB compared against?
> > Date: Wed, 7 Aug 2019 13:18:04 -0700
> > From: Michael Carey 
> > 
> > To:   Karl Pietrzak   kap4...@gmail.com>
> >
> >
> > It's known in the literature as "System X" - other work, e.g., the
> > early MIT/Brown work in 2008-2009, used that same system in their
> > work on commercial databases vs. Hadoop and referred to it as
> > "System X" (so we did the same, name-wise).  Most for-pay database
> > vendors' license agreements have a clause known informally as the
> > "DeWitt clause" that prohibit publishing any performance results
> > from their systems, so the tradition in the DBMS academic
> > benchmarking world is to not name the systems.  This one was a
> > commercial shared-nothing parallel DBMS that is known to be a
> > solid performer (it wasn't just a strawman) and in the end the
> > graduate student who ran the numbers visited them on-site to get
> > some help in properly setting up the system.
> >
> > If you go to http://asterix.ics.uci.edu//publications.html you can
> > find a copy of the BigFUN benchmark paper (which this was a
> > preliminary version of) as well as some other papers that might be
> > of interest.
> >
> > Cheers,
> >
> > Mike
> >
> > On 8/7/19 10:52 AM, Karl Pietrzak wrote:
> >> Hi everyone!
> >>
> >> Looking at the home page
> >> (https://asterixdb.apache.org/index.html), I'm wondering what
> >> parallel DBMS is AsterixDB being compared against?
> >>
> >> Is there more information on this benchmark, too?
> >>
> >> Thanks!
> >>
> >> --
> >> Karl
> >
> >
> >
> > --
> > Karl
>


Re: Recent change on removing statement as request body

2019-04-08 Thread Chen Li
We talked.  The conclusion was that it's OK to change the interface, as
long as it works properly.  Chen Luo mentioned that the new interface may
not work correctly.

On Mon, Apr 8, 2019 at 1:32 PM Till Westmann  wrote:

> Just one request: please bring the results of your f2f meeting back to
> the list.
>
> Thanks,
> Till
>
> On 8 Apr 2019, at 9:20, Chen Li wrote:
>
> > Qiushi and Chen Luo: please do a F2F meeting to discuss this issue and
> > make
> > a plan.  Please include me in the meeting.
> >
> > On Sun, Apr 7, 2019 at 11:05 PM Till Westmann 
> > wrote:
> >
> >> Hi Chen,
> >>
> >> We could revert the change, but I'd prefer to fix the issues that we
> >> have in
> >> the (incompletely) documented API.
> >> I think that the solution for multiple statements should be the
> >> "multi-statement" parameter. However I thought that the default value
> >> for this
> >> parameter was "true", so I'm not sure what causes the additional
> >> statements to
> >> be ignored.
> >> And I have no idea, why double quotes would be ignored by any parser.
> >> Is
> >> this
> >> the SQL++ parser that causes problems?
> >>
> >> I'd like to get this resolved (and the docs updated to the point
> >> where
> >> they are
> >> useful) as soon as possible - we clearly would like to keep
> >> Cloudberry
> >> up and
> >> running.
> >>
> >> Could you send an example (or file an issue) that shows the problem?
> >>
> >> Cheers,
> >> Till
> >>
> >> On 6 Apr 2019, at 16:18, Chen Luo wrote:
> >>
> >>> Hi devs,
> >>>
> >>> I noticed there is a recent change on master that removes the
> >>> undocumented
> >>> ability to use the request body as the statement [1]. This patch
> >>> breaks
> >>> many of my experiment scripts and *many data preparation scripts
> >>> used
> >>> by
> >>> Cloudberry.* Also, I had a hard time to modify my scripts to use the
> >>> "statement" parameter for two difficulties:
> >>>
> >>>1. It seems that only the first statement is executed but the
> >>> rest
> >>> are
> >>>simply ignored;
> >>>2. Double quotes are always ignored by the parser.
> >>>
> >>> Can this patch be reverted? If not, can we at least update our wiki
> >>> [2][3]
> >>> to give examples about multiple queries and handling double quotes?
> >>>
> >>> Best regards,
> >>> Chen Luo
> >>>
> >>> [1] https://asterix-gerrit.ics.uci.edu/#/c/3267/
> >>> [2]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/ASTERIXDB/New+HTTP+API+Design
> >>> [3] https://ci.apache.org/projects/asterixdb/api.html
> >>
>


Re: Recent change on removing statement as request body

2019-04-08 Thread Chen Li
Qiushi and Chen Luo: please do a F2F meeting to discuss this issue and make
a plan.  Please include me in the meeting.

On Sun, Apr 7, 2019 at 11:05 PM Till Westmann  wrote:

> Hi Chen,
>
> We could revert the change, but I'd prefer to fix the issues that we
> have in
> the (incompletely) documented API.
> I think that the solution for multiple statements should be the
> "multi-statement" parameter. However I thought that the default value
> for this
> parameter was "true", so I'm not sure what causes the additional
> statements to
> be ignored.
> And I have no idea, why double quotes would be ignored by any parser. Is
> this
> the SQL++ parser that causes problems?
>
> I'd like to get this resolved (and the docs updated to the point where
> they are
> useful) as soon as possible - we clearly would like to keep Cloudberry
> up and
> running.
>
> Could you send an example (or file an issue) that shows the problem?
>
> Cheers,
> Till
>
> On 6 Apr 2019, at 16:18, Chen Luo wrote:
>
> > Hi devs,
> >
> > I noticed there is a recent change on master that removes the
> > undocumented
> > ability to use the request body as the statement [1]. This patch
> > breaks
> > many of my experiment scripts and *many data preparation scripts used
> > by
> > Cloudberry.* Also, I had a hard time to modify my scripts to use the
> > "statement" parameter for two difficulties:
> >
> >1. It seems that only the first statement is executed but the rest
> > are
> >simply ignored;
> >2. Double quotes are always ignored by the parser.
> >
> > Can this patch be reverted? If not, can we at least update our wiki
> > [2][3]
> > to give examples about multiple queries and handling double quotes?
> >
> > Best regards,
> > Chen Luo
> >
> > [1] https://asterix-gerrit.ics.uci.edu/#/c/3267/
> > [2]
> >
> https://cwiki.apache.org/confluence/display/ASTERIXDB/New+HTTP+API+Design
> > [3] https://ci.apache.org/projects/asterixdb/api.html
>


Re: Time to deprecate AQL?

2017-09-07 Thread Chen Li
Let's discuss how to move AQL+ to SQL++ after Taewoo comes back.

On Thu, Sep 7, 2017 at 12:10 PM, Taewoo Kim  wrote:

> For similarity join, we use AQL+ that is based on AQL. I think deprecating
> (not removing) AQL is OK. Ultimately, AQL+ should be converted to SQL++ :-)
>
> Best,
> Taewoo
>
> On Thu, Sep 7, 2017 at 9:04 PM, Steven Jacobs  wrote:
>
> > I’ll give the BADest +1 I can :)
> > Steven
> >
> > On Thu, Sep 7, 2017 at 8:50 PM Gerald Sangudi  wrote:
> >
> > > :-)
> > >
> > > On Sep 7, 2017 11:44 AM, "Michael Carey"  wrote:
> > >
> > > As AsterixDB evolves, and additional features are added - e.g.,
> DISTINCT
> > > aggregate support, or properly implemented query-bodied functions,
> > > supporting two query languages is hugely expensive:  Updating two
> > grammars,
> > > parsers, rules, tests, ... IMO it is time to let go of AQL as an
> > externally
> > > supported interface to AsterixDB and only have SQL++ going forward.  I
> > > think "everyone" has migrated - and if not we should force that
> > migration.
> > > (Cloudberry is on SQL++ nowadays, BAD is on SQL++ nowadays, ...)  Any
> > > objections?  If not, I think we should make this decision officially
> and
> > > stop putting energy into carrying the AQL legacy around with us.
> > Thoughts?
> > >
> >
>


Re: How to Link (in Gerrit) your old Yahoo ID to a Google/Github login

2017-02-03 Thread Chen Li
Ian did something magic.   Now I log in using my google account, and can
see my real (better) name :-)

Chen

On Thu, Feb 2, 2017 at 10:11 PM, Chen Li <che...@gmail.com> wrote:

> I followed these steps and linked my gmail account.  But the gerrit site
> shows a weird account for me "Anon. E. Moose (1000151)".  When I tried to
> "add me" to the changes at https://asterix-gerrit.ics.uci.edu/#/c/1481/,
> it lists me as "1000151".  Any idea how to fix it?
>
> Chen
>
> On Mon, Jan 30, 2017 at 4:06 PM, Ian Maxon <ima...@uci.edu> wrote:
>
>> Hey all,
>> Once Gerrit is actually upgraded (hopefully tonight) you should go
>> ahead and link a Google or Github account to your existing account
>> that uses a Yahoo ID. From that point on you can sign in via that auth
>> method and not bother with Yahoo anymore.
>>
>> Firstly, however, when the upgraded site comes online, _DON'T_ try to
>> login via Google/Gerrit if you already have an account on Gerrit. This
>> creates two accounts! What you really want is to link your
>> Google/Github identity to your existing account.
>>
>> To be able to login via Google/Github without creating two accounts,
>> log in with your Yahoo ID as normal once the upgrade is finished, then
>> click your name in the top right hand corner and go to "Settings".
>> Then, on the left hand side go to "Identities". Then, click the 'Link
>> Another Identity" button on the right hand side. This will take you to
>> the login screen again, but this time instead of using your Yahoo ID,
>> just login with Google or Github and you will have added that as a way
>> to log into your account.
>>
>> For those that remember, this is basically the same process we had
>> when Google deprecated OpenID, just in reverse this time around.
>>
>> Thanks,
>> - Ian
>>
>
>


Re: [VOTE] Release Apache AsterixDB 0.9.0 and Hyracks 0.3.0 (RC2)

2017-01-22 Thread Chen Li
+1


Re: Function name change: contains() -> string-contains()

2016-09-15 Thread Chen Li
For full-text search, I like "ftcontains()" since it's very intuitive.

Syntax for advanced full-text features such as stop words, analyzers, and
languages need a separate discussion.

Chen

On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim  wrote:

> @Till: I see. Thanks for the suggestion. It's more clearer now.
>
> Best,
> Taewoo
>
> On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann  wrote:
>
> > And as it turns out, we already have some infrastructure to translate a
> > constant record constructor expression into a record in
> > LangRecordParseUtil.
> > So supporting that wouldn’t be too painful.
> >
> > Cheers,
> > Till
> >
> >
> > On 15 Sep 2016, at 17:41, Till Westmann wrote:
> >
> > One option to express those parameters, would be to pass in a (compile
> time
> >> constant) record/object. E.g.
> >>
> >> where ftcontains($o.title, ["hello","hi"],
> >>  { "combine": "and", "stop list": "default" })
> >>
> >> That way we could have named optional parameters (please ignore the
> >> ugliness of
> >> my chosen parameters) which avoid the problem of dealing with positions.
> >> We do have a nested datamodel, so we could put it to good use here :)
> >>
> >> Does this make sense?
> >>
> >> Cheers,
> >> Till
> >>
> >> On 15 Sep 2016, at 16:26, Taewoo Kim wrote:
> >>
> >> @Till: we can add whether the given search is AND/OR search, stop list
> >>> and/or stemming method. For example, if we use ftcontains(), then it
> >>> might
> >>> look like:
> >>>
> >>> 1) where ftcontains($o.title, "hello"): find $o where the title field
> >>> contains hello.
> >>> 2) where ftcontains($o.title, ["hello","hi"], any): find $o where the
> >>> title
> >>> field contains hello *and/or* hi.
> >>> 3) where ftcontains($o.title, ["hello","hi"], all): find $o where the
> >>> title
> >>> field contains both hello *and* hi.
> >>> 4) where ftcontains($o.title, ["hello","hi"], all, defaultstoplist):
> find
> >>> $o where the title field contains both hello *and* hi. Also apply the
> >>> default stoplist to the search. The default stop list contains the
> number
> >>> of English common words that can be filtered.
> >>>
> >>> The issue here is that the position of each parameter should be
> observed
> >>> (e.g., the third one indicates whether we do disjunctive/conjunctive
> >>> search. The fourth one tells us which stop list we use). So, if we have
> >>> three parameters, how to specify/omit these becomes a challenge.
> >>>
> >>> Best,
> >>> Taewoo
> >>>
> >>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann 
> wrote:
> >>>
> >>> Makes sense to me (especially as I always think about this specific one
>  as
>  "ftcontains" :) ).
> 
>  Another thing you mentioned is about the parameters that will get
> added
>  in
>  the
>  future. Could you provide an example for this?
> 
>  Cheers,
>  Till
> 
>  On 15 Sep 2016, at 15:37, Taewoo Kim wrote:
> 
>  Maybe we could come up with a function form - *ftcontains*(). Here, ft
>  is
> 
> >
> > an abbreviation for full-text. This function replaces "contains text"
> > in
> > XQuery spec. An example might be:
> >
> > XQuery spec: where $o.titile contains text "hello"
> > AQL: where ftcontains($o.title, "hello")
> >
> > Best,
> > Taewoo
> >
> > On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim 
> > wrote:
> >
> > @Till: Got it. I agree to your opinion. The issue here for the
> > full-text
> >
> >> search is that many function parameters that controls the behavior
> of
> >> full-text search will be added in the future. Maybe this is not the
> >> issue?
> >> :-)
> >>
> >> Best,
> >> Taewoo
> >>
> >> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann 
> >> wrote:
> >>
> >> Hi,
> >>
> >>>
> >>> I think that our challenge here is, that XQuery is very liberal in
> >>> the
> >>> introduction of new keywords, as the grammar is keyword free.
> >>> However,
> >>> they
> >>> often use combinations of words "contain" "text" to disambiguate.
> >>> AQL on the other had is not keyword free and so each time we
> >>> introduce a
> >>> new
> >>> one, we create a backwards compatibility problem. It seems that for
> >>> AQL
> >>> using a
> >>> function-based syntax would create fewer problems.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote:
> >>>
> >>> Hello All,
> >>>
> >>>
>  I would like to suggest a current function name change. I am
>  currently
>  working on Full Text Search features. XQuery Full-text search spec
>  [1]
>  states that for a full-text search, the syntax is *RangeExpr (
>  "contains"
>  "text" FTSelection FTIgnoreOption? )?*. As you see, we are 

Re: Creating RTree: no space left

2016-09-15 Thread Chen Li
@Wail: as a use case related to selectivity, our current Cloudberry
prototype doesn't benefit from R-tree when the user is analyzing the data
for the entire US.  But we expect to have R-tree benefits when a user zooms
into a small region.

On Thu, Sep 15, 2016 at 8:25 AM, Wail Alkowaileet 
wrote:

> Hi Ahmed and Mike,
>
> @Ahmed
> I actually did a small experiment where I loaded about 1/5 of the data (so
> I can index it) and seems that the R-Tree was really useful for querying
> small regions or neighborhoods.
> I also tried the B-Tree and it was slower than a full scan.
>
> @Mike
> Unfortunately, I cannot still even after anonymization :-)
>
>
> On Wed, Sep 14, 2016 at 11:29 PM, Mike Carey  wrote:
>
> > Interesting point, so to speak.  @Wail, any chance you could post a
> Google
> > maps screenshot showing a visualization of the points in this dataset on
> > the underlying geographic region?  (If the dataset is shareable in that
> > anonymized form?)  I would think an R-tree would still be good for
> > small-region geo queries - possibly shrinking the candidate object set
> by a
> > factor of 10,000 - so still useful - and we also do index-AND-ing now, so
> > we would also combine that shrinkage by other index-provided shrinkage on
> > any other index-amenable predicates.  I think the queries are still
> spatial
> > in nature, and the only AsterixDB choices for that are R-tree.  (We did
> > experiments with things like Hilbert B-trees, but the results led to the
> > conclusion that the code base only needs R-trees for spatial data for the
> > forseeable future - they just work too well and in a no-tuning-required
> > fashion :-))
> >
> >
> >
> > On 9/14/16 12:49 PM, Ahmed Eldawy wrote:
> >
> >> Looks like an interesting case. Just a small question. Are you sure a
> >> spatial index is the right one to use here? The spatial attribute looks
> >> more like a categorization and a hash or B-tree index could be more
> >> suitable. As far as I know, the spatial index in AsterixDB is a
> secondary
> >> R-tree index which, like any other secondary index, is only good for
> >> retrieving a small number of records. For this dataset, it seems that
> any
> >> small range would still return a huge number of records.
> >>
> >> It is still interesting to further investigate and fix the sort issue
> but
> >> I
> >> mentioned the usage issue for a different perspective.
> >>
> >> Thanks
> >> Ahmed
> >>
> >> On Wed, Sep 14, 2016 at 10:30 AM Mike Carey  wrote:
> >>
> >> ☺!
> >>>
> >>> On Sep 14, 2016 1:11 AM, "Wail Alkowaileet" 
> wrote:
> >>>
> >>> To be exact
>  I have 2,255,091,590 records and 10,391 points :-)
> 
>  On Wed, Sep 14, 2016 at 10:46 AM, Mike Carey 
> wrote:
> 
>  Thx!  I knew I'd meant to "activate" the thought somehow, but couldn't
> > remember having done it for sure.  Oops! Scattered from VLDB, I
> >
>  guess...!
> >>>
> 
> >
> > On 9/13/16 9:58 PM, Taewoo Kim wrote:
> >
> > @Mike: You filed an issue -
> >> https://issues.apache.org/jira/browse/ASTERIXDB-1639. :-)
> >>
> >> Best,
> >> Taewoo
> >>
> >> On Tue, Sep 13, 2016 at 9:28 PM, Mike Carey 
> >>
> > wrote:
> >>>
>  I can't remember (slight jetlag? :-)) if I shared back to this list
> >>
> > one
> >>>
>  theory that came up in India when Wail and I talked F2F - his data
> >>>
> >> has
> >>>
>  a
> 
> > lot of duplicate points, so maybe something goes awry in that case.
> >>>
> >> I
> >>>
>  wonder if we've sufficiently tested that case?  (E.g., what if there
> >>>
> >> are
> 
> > gazillions of records originating from a small handful of points?)
> >>>
> >>>
> >>> On 8/26/16 9:55 AM, Taewoo Kim wrote:
> >>>
> >>> Based on a rough calculation, per partition, each point field takes
> >>>
> >> 3.6GB
> 
> > (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we
> 
> >>> are
> >>>
>  generating 625 files (96MB or 128MB each) = 157GB. Since Wail
> 
> >>> mentioned
> 
> > that there was no issue when creating a B+ tree index, we need to
> 
> >>> check
> 
> > what SORT process is required by R-Tree index.
> 
>  Best,
>  Taewoo
> 
>  On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <
> 
> >>> jianfeng@gmail.com
> >>>
>  wrote:
> 
>  If all of the file names start with “ExternalSortRunGenerator”,
> then
>  they
> 
>  are the first round files which can not be GCed.
> > Could you provide the query plan as well?
> >
> > On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <
> wael@gmail.com
> > wrote:
> >
> > Hi Ian and Pouria,
> >
> >> The 

Re: index-only plans

2016-08-12 Thread Chen Li
I am always a big fan of separating a big merge into multiple small
changes.  It will be good to do this "partitioning."

Chen

On Thu, Aug 11, 2016 at 2:46 PM, Taewoo Kim  wrote:

> Thanks Till for reviewing this giant patch set.
>
> At this moment, what I can try to do is removing all necessary test cases
> and changes that are related to full-text search preparation (changing the
> function name of "contains" to "string-contains") since I thought this
> index-only plan branch could be merged first.
>
> I tried to separate logical LIMIT push-down to the index search and
> index-only plan. But, it turns out that it was hard. Other than this, all
> changes are related to index-only plan part (most of them are accessMethod
> related.) In addition, Young-Seok already had one round.
>
>
> Best,
> Taewoo
>
> On Thu, Aug 11, 2016 at 2:43 PM, Till Westmann  wrote:
>
> > Hi,
> >
> > we still have the big change on index-only plans outstanding. I think
> that
> > it would be good to have that feature. However, at it’s current size
> (+45K
> > lines, -15K lines) it is very (!) difficult to review. So I think that
> one
> > approach to get there would be to break it down into smaller more
> > achievable
> > steps.
> > I’ve added a few comments to the review with thoughts I had to do that.
> > What do you think?
> > Is that a good approach? Is it feasible?
> > Are there other ways?
> >
> > Thanks for your thoughts,
> > Till
> >
>


Re: questions about index-only change

2016-07-12 Thread Chen Li
Per our discussion earlier, it will be really good to add these
documents (at least their URLs) to our documentation site to
accumulate the knowledge.

Chen

On Fri, Jul 8, 2016 at 1:43 PM, Yingyi Bu  wrote:
> Cool, thanks a lot, Taewoo!
>
> Best,
> Yingyi
>
> On Fri, Jul 8, 2016 at 1:36 PM, Taewoo Kim  wrote:
>
>> Sure. This is the design docs. There are some changes made and I need to
>> reflect them. But, these can show the main design.
>>
>> Index-only
>>
>> https://docs.google.com/presentation/d/1HcoQwaTQu8K2Xdzg46RZP60LqON2oKnWkZx1z1buF1U/edit?usp=sharing
>>
>> Limit Push-down
>>
>> https://docs.google.com/presentation/d/1lvSLF9j7pcKo2nHkVoiOD9vNSFCNsVDGQYnYCdHhngk/edit?usp=sharing
>>
>> Regarding the numbers, I have collected some number in the past using
>> Pouria's bigFun Benchmark. I have used my version of queries. The result is
>> not based on the current design. The huge difference is now we are using
>> instantTryLock, rather than tryLock. But you can still get a sense.
>>
>> https://docs.google.com/spreadsheets/d/1YuTuw24TUthr0YhEHMmGr9E4tCYAFxJjlg3S67zRY-M/edit?usp=sharing
>>
>>
>>
>>
>> Best,
>> Taewoo
>>
>> On Fri, Jul 8, 2016 at 1:09 PM, Yingyi Bu  wrote:
>>
>>> Hi Taewoo,
>>>
>>> I have a few questions regarding to your index-only change (I'm cc-ing to
>>> dev just in case more people are interested in the topic.):
>>>
>>> 1. Is there any design doc or write up for the index-only change?
>>>
>>> 2. Do you have ddls/queries that are designed for the index-only
>>> performance testing?  Do you have some initial performance numbers that
>>> compare index-only plans and primary-index-access plans?
>>>
>>> Thanks!
>>>
>>> Yingyi
>>>
>>>
>>>
>>


Re: new AsterixDB web interface demo up and running

2016-07-12 Thread Chen Li
This is an old discussion.  The URL http://173.82.2.197:19006/ doesn't
work for me now.  I assume it's no longer available.  Is there any way
to see the new UI?

Chen

On Fri, Jun 17, 2016 at 12:10 AM, Till Westmann  wrote:
> Results for multiple queries are not that easy for the new HTTP API design
> [1] that we’re trying to finish right now. For that design we’re planning to
> have many statements, but to only return the result of the last statement.
> The challenge with multiple results is that the newer API also returns quite
> a bit of metadata (errors, metrics, signature) which would also need to be
> available in multiples and complicate the structure of the result further.
>
> Cheers,
> Till
>
> [1]
> https://cwiki.apache.org/confluence/display/ASTERIXDB/New+HTTP+API+Design
>
>
> On 17 Jun 2016, at 2:45, Mike Carey wrote:
>
>> Sounds like a bug in the underlying http UI?!  It would be nice to
>> preserve the multiple-result-area approach that the existing web UI uses in
>> that case, somehow...  It's interesting that this hasn't come up before - we
>> should have test cases for the basic UI for that, I would think?
>>
>>
>> On 6/16/16 5:24 PM, Ian Maxon wrote:
>>>
>>> Kaveen and I talked about this earlier today actually, the result (from
>>> *DB) for the two above queries is actually fine and parseable JSON. It's
>>> just a labeling issue as the result is shown as if it were 3 records
>>> rather
>>> than 3 lists.
>>> A more vexing question however that came up is what to do about multiple
>>> queries in one submission. Right now those come back as multiple JSON
>>> objects appended to each other apparently.
>>>
>>> On Thu, Jun 16, 2016 at 5:14 PM, Mike Carey  wrote:
>>>
 @Ian & @Chris:  Can you provide some helpful hints in the direction of
 parsing returned ADM?  (Since you are kind of addressing that as we
 speak
 for other reasons?)

 @Kaveen:  Off to a cool start!  In terms of the sorts of things that can
 come back, *conceptually*, the return clause of a query can yield a
 scalar
 value, an ordered list, an unordered list, or a record.  (The various
 possible scalar values are all of the data types listed in the ADM data
 model spec.)  A for-clause actually always returns a list of whatever
 the
 return clause says to return - and a let-clause (I believe) or a
 standalone
 expression can return a singleton object (of any of the aforementioned
 forms) if I'm not mistaken.  For testing the Web UI, it would probably
 be
 worth coming up with a set of test queries that returns each of those
 things.  (Mixed of them are also possible - life in semistructured data
 land can be messy.)

 Cheers,

 Mike


 On 6/16/16 11:11 AM, Kaveen Rodrigo wrote:

> oh I see, Thank you Yingyi,
>
> I did update the VPS with the fixes for Q1, the only way to fix Q2 and
> the
> new query is to write a little parser since that output isn't valid
> json.
>
> cheers,
> Kaveen
>
> On 16 June 2016 at 22:59, Yingyi Bu  wrote:
>
> Any valid ADM (asterix data model) instance can be a result row.
>>
>> ADM: https://ci.apache.org/projects/asterixdb/aql/datamodel.html
>>
>> A single curly bracket means a record constructor.  A record consists
>> of
>> fields, where each field is an name-value pair.
>> Therefore,
>> {
>>   [1,2,3],
>>   [2,3,4],
>>   [5,6,7]
>> }
>> cannot be a valid result.
>>
>> But you are able to get
>>   [1,2,3],
>>   [2,3,4],
>>   [5,6,7]
>>
>> by running the following query:
>>
>> for $x in [
>>   [1,2,3],
>>   [2,3,4],
>>   [5,6,7]
>> ]
>> return $x;
>>
>> Let me know if you have more questions.
>>
>> Best,
>> Yingyi
>>
>>
>>
>>
>>
>>
>> On Thu, Jun 16, 2016 at 10:23 AM, Kaveen Rodrigo <
>> u.k.k.rodr...@gmail.com>
>> wrote:
>>
>> Hey Yingyi,
>>>
>>> I fixed that issue, didn't update the VPS yet.  One question Can
>>> there
>>> be
>>> results which returns arrays?
>>>
>>> for example
>>> {
>>>   [1,2,3],
>>>   [2,3,4],
>>>   [5,6,7]
>>> }
>>>
>>> if that's so, if you have some time can you give me an AQl query
>>> which
>>> will produce something like that.
>>>
>>> thanks in advance,
>>> Kaveen
>>>
>>> On 16 June 2016 at 22:28, Yingyi Bu  wrote:
>>>
>>> Awesome!  Thanks, Kaveen!

 Best,
 Yingyi

 On Thu, Jun 16, 2016 at 9:56 AM, Kaveen Rodrigo <

>>> u.k.k.rodr...@gmail.com
>>> wrote:

 Yikes, Thanks Yingyi,
>
> I never expected the results array to contain values, I'll get 

Re: [DRAFT] [REPORT] Apache AsterixDB June 2016

2016-06-07 Thread Chen Li
Looks good to me.

Chen

On Tue, Jun 7, 2016 at 2:56 PM, Till Westmann  wrote:

> Ouch! Thanks for catching this.
> I’m very sorry, Kaveen!
>
> Here’s a fixed proposal.
>
> Till
>
> —
>
> Description:
>
> Apache AsterixDB is a scalable big data management system (BDMS) that
> provides storage, management, and query capabilities for large collections
> of semi-structured data.
>
> Activity:
>
> - The general state of the project (and the report) are largely unchanged
>   from last month. Development and discussions are active, the community is
>   healthy and engaged. The move of infra sources has been started (the
>   mailing lists were moved), but the migration ticket in JIRA is still
>   "waiting for infra".
> - Kaveen Rodrigo has started working with the AsterixDB community on a GSoC
>   project.
>
> Issues:
>
> - TLP migration JIRA tasks not started yet:
>   https://issues.apache.org/jira/browse/INFRA-11789
>
> PMC/Committership changes:
>
> There have been no changes since graduation. The last committer/PPMC member
> added was Michael Blow on 2016-03-28.
>
> Releases:
>
> AsterixDB graduated from the Incubator on April 20, 2016. The last releases
> were on February 26, 2016: AsterixDB 0.8.8-incubating and Hyracks
> 0.2.17-incubating.
>
>
> On 7 Jun 2016, at 23:51, Steven Jacobs wrote:
>
> That is the wrong student  for AsterixDB summer of code. Melaka is working
>> on the VXQuery project. The Asterix student is
>>
>> Kaveen Rodrigo
>>
>>
>> Steven
>>
>> On Tue, Jun 7, 2016 at 2:25 PM, Till Westmann  wrote:
>>
>> Hi,
>>>
>>> here’s a proposal for this month’s board report.
>>> Thoughts/additions/concerns?
>>>
>>> Thanks,
>>> Till
>>>
>>> —
>>>
>>>
>>> Description:
>>>
>>> Apache AsterixDB is a scalable big data management system (BDMS) that
>>> provides storage, management, and query capabilities for large
>>> collections
>>> of semi-structured data.
>>>
>>> Activity:
>>>
>>> - The general state of the project (and the report) are largely unchanged
>>>   from last month. Development and discussions are active, the community
>>> is
>>>   healthy and engaged. The move of infra sources has been started (the
>>>   mailing lists were moved), but the migration ticket in JIRA is still
>>>   "waiting for infra".
>>> - Menaka Madushanka has started working with the AsterixDB community on a
>>>   GSoC project.
>>>
>>> Issues:
>>>
>>> - TLP migration JIRA tasks not started yet:
>>>   https://issues.apache.org/jira/browse/INFRA-11789
>>>
>>> PMC/Committership changes:
>>>
>>> There have been no changes since graduation. The last committer/PPMC
>>> member
>>> added was Michael Blow on 2016-03-28.
>>>
>>> Releases:
>>>
>>> AsterixDB graduated from the Incubator on April 20, 2016. The last
>>> releases
>>> were on February 26, 2016: AsterixDB 0.8.8-incubating and Hyracks
>>> 0.2.17-incubating.
>>>
>>>


Re: User Define Function (UDF) in AsterixDB

2016-06-04 Thread Chen Li
We had several separate discussions about UDF.  Is it possible to take
this chance to polish the documentation so that other users can rely
on the documentation to get started?

Chen

On Fri, Jun 3, 2016 at 12:08 AM, Xikui Wang  wrote:
> Hi Heri,
>
> Thanks for sharing the document. It is useful as the general structure of
> UDF remains the same.
>
> Best,
> Xikui
>
> On Thu, Jun 2, 2016 at 11:34 PM, Heri Ramampiaro  wrote:
>
>> Xikui,
>>
>> Enclosed is an instruction based on the older version of feeds and UDF
>> that perhaps could help you
>> figur out the principle behind installing external libs in AsterixDB
>>
>> Best,
>> -heri
>>
>>
>>
>>
>> > On Jun 2, 2016, at 23:44, Xikui Wang  wrote:
>> >
>> > Hi Abdullah,
>> >
>> > Thanks for your help. I met an error when I was trying to execute
>> 'install
>> > externallibtest testlib PATH/TO/testlib-zip-binary-assembly.zip' from the
>> > web query interface. Probably I used this in a wrong way?
>> >
>> > Best,
>> > Xikui
>> >
>> > On Thu, Jun 2, 2016 at 2:26 PM, abdullah alamoudi 
>> > wrote:
>> >
>> >> Hi Xikui,
>> >> 1. How to install UDF on instance running from Eclipse+
>> >> AsterixHyracksIntegrationUtil?
>> >>
>> >> There are a few external library test cases, you can look at them and
>> see
>> >> how we test those. One thing you will notice is that we only test a few
>> >> examples. Clearly, we can do better. You can find the test cases in:
>> >>
>> >>
>> >>
>> asterixdb/asterixdb/asterix-app/src/test/resources/runtimets/queries/external-library
>> >>
>> >> As for the difference between scalar, aggregate, and unnest functions,
>> here
>> >> is the way I see it:
>> >> 1. Scalar: one input to one output.
>> >> 2. Aggregate: 0 or more inputs to one output.
>> >> 3. Unnest: one input to 0 or more outputs.
>> >>
>> >> Hope that helps,
>> >> Abdullah.
>> >>
>> >> On Thu, Jun 2, 2016 at 11:40 PM, Xikui Wang  wrote:
>> >>
>> >>> Hi Devs,
>> >>>
>> >>> I want to use UDF to process the Tweets that I got from the feed, and I
>> >> met
>> >>> following two questions. Hope you guys can help me or point me to the
>> >> right
>> >>> documentation.
>> >>>
>> >>> 1. How to install UDF on instance running from
>> >>> Eclipse+AsterixHyracksIntegrationUtil?
>> >>>
>> >>> Website only mentioned how to install with Managix. I am wondering if
>> >> there
>> >>> is a way for me to install it on instance running in Eclipse, which is
>> >>> easier for debugging.
>> >>>
>> >>> 2. Implementation of UDF
>> >>>
>> >>> I found several UDFs in
>> >>>
>> >>>
>> >>
>> asterixdb/asterix-external-data/src/test/java/org/apache/asterix/external/library,
>> >>> like SumFunction, ParseTweetFunction. I assume if I want to implement
>> new
>> >>> UDF, it needs to implement IExternalScalarFunction interface and to be
>> >> put
>> >>> under the same directory? I also found 'aggregate' and 'unnest' type
>> >> which
>> >>> is not implemented yet. Just out of curiosity, what is the difference
>> >>> between them?
>> >>>
>> >>> Thanks in advance! :)
>> >>>
>> >>> Best,
>> >>> Xikui
>> >>>
>> >>
>>
>>
>>