AsterixDB Development Meeting Summary: 04/05/24

2024-04-05 Thread Wail Alkowaileet
Top level announcements/discussions:
- Preparing report

Issue roundup:

New:
- 3372 Enforce min. in COPY TO's max-objects-per-file
- 3373 Unlimited storage EPIC
- 3374 Contextualize BufferCache Ops
- 3375 local disk caching APIs

Fixed/Implemented:
- 3372
- 3374

Open APEs:

Atomic statements:
- Drafting

Open type datasets:
- Drafting

Unlimited storage:
- Awaiting approval

Other development status/discussion not in issues/APEs:
-Perf experiments are ongoing for the schema inference patch
-Hybrid groupby experiments are ongoing
-Unlimited storage (Block-level caching for S3-backed storage) is under
development

-- 

*Regards,*
Wail Alkowaileet


[APE] Unlimited Storage: Local disk caching in cloud deployment

2024-04-03 Thread Wail Alkowaileet
In the current cloud deployment, users are limited by the disk space of the
cluster's nodes. However, the blob storage services provided by cloud
providers (e.g., S3) can virtually store an "unlimited" amount of data.
Thus, AsterixDB can provide the means to store beyond what the cluster's
local drives can.

In this proposal, we want to extend AsterixDB's capability to allow the
local drives to act as a cache, instead of a mirror image of what's stored
in the cloud. By "as a cache" we mean files and pages can be
retrieved/persited and removed (evicted) from the local drives, according
to some policy.

The aim of this proposal is to describe and implement a mechanism called "*Weep
and Sweep*". Those are the names of two phases when the amount of the data
in the cloud exceeds the space of the cluster's local disks.
Weep

When the disk is pressured (the pressure size can be configured), the
system will start to "weep" and devise a plan to what should be "evicted"
according to some statistics and policies, *which are not solidified yet
and still a work in progress.*
Sweep

After "weeping", a sweep operation will take place and start evicting what
the weep's plan considers as evictable. Depending on the index type
(primary/secondary) and the storage format (row/column), the smallest
evictable unit can differ. The following table shows the smallest unit of
evictable unit:
*Index Type* *Evictable*
Metadata Indexes (e.g., Dataset, ..etc) Not evictable
Secondary indexes Evicted as a whole
Primary Indexes (Row) Evicted as a whole
Primary Indexes (Columnar) Columns (or columns’ pages)
Featured Considerations

   - For columnar primary index, they will never be downloaded as a whole
  - Instead, columns will be streamed from the cloud (if accessed for
  the first time) and persisted to local disk if necessary
   - We are considering providing a mechanism to prefetch the next columns
   of the next mega-leaf node
   <https://www.vldb.org/pvldb/vol15/p2085-alkowaileet.pdf>. The hope here
   is to mask any latencies when reading columns from the cloud
   - Depending on the disk pressure and the operation, the system can
   determine if the streamed columns from the cloud are "worthy" to be cached
   locally. For example, if columns are read in a merge operation, it might
   not be "wise" to persist these columns as their on-disk component is going
   to be deleted at the end of the merge operation. Thus, it might be "better"
   to dedicate the free space on disk for the newly created/merged component.


Multiple aspects (such as the evictable units and policies) of this APE are
not solidified yet, but the core concepts are in place and are ready for
the community's vote :)

EPIC: ASTERIXDB-3373 <https://issues.apache.org/jira/browse/ASTERIXDB-3373>
-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Release Apache AsterixDB 0.9.8.2 and Apache Hyracks 0.9.8.2 (RC0)

2024-03-02 Thread Wail Alkowaileet
+1


*Regards,*
Wail Alkowaileet


On Fri, Mar 1, 2024 at 13:50 Ian Maxon  wrote:

> Hi everyone,
>
> Please verify and vote on the latest stabilization release of Apache
> AsterixDB.
>
> The change that produced this release is up on Gerrit:
>
> https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18187
>
> The release artifacts are as follows:
>
> AsterixDB Source
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.8.2-source-release.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.8.2-source-release.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.8.2-source-release.zip.sha512
>
> SHA512:
>
> 9424c56301e538639170b2c8064a8cb16cb9278423bfca4b3475cb36ce8067bc49f001a6dc3970e6b45b1b3bd275537ef0a54145ca829e26242f82c7e05d8a58
>
> Hyracks Source
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.8.2-source-release.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.8.2-source-release.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.8.2-source-release.zip.sha512
>
> SHA512:
>
> f17e004ad4b3dcf74db4582ae064189c27ae8f9f3aa352be1d11b936ac1bd679607318f5dfbd86e3d915f9f2e4550271db8ad144b3a41e182738047e398ba477
>
> AsterixDB NCService Installer:
>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.8.2-binary-assembly.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.8.2-binary-assembly.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.8.2-binary-assembly.zip.sha512
>
> SHA512:
>
> cb7a8e941090332f99589257f14913b4dcc9c15a9a3c4395c558fae68a6570c0bf14824be3aa1a672917a61a4c07988cacd705be12012e246d161ca8b69949c5
>
> The KEYS file containing the PGP keys used to sign the release can be
> found at
>
> https://dist.apache.org/repos/dist/release/asterixdb/KEYS
>
> RAT was executed as part of Maven via the RAT maven plugin, but
> excludes files that are:
>
> - data for tests
> - procedurally generated,
> - or source files which come without a header mentioning their license,
>  but have an explicit reference in the LICENSE file.
>
>
> The vote is open for 72 hours, or until the necessary number of votes
> (3 +1) has been reached.
>
> Please vote
> [ ] +1 release these packages as Apache AsterixDB 0.9.8.2 and
> Apache Hyracks 0.3.8.2
> [ ] 0 No strong feeling either way
> [ ] -1 do not release one or both packages because ...
>


Re: [VOTE] Release Apache AsterixDB 0.9.9 and Apache Hyracks 0.3.9 (RC1)

2024-02-29 Thread Wail Alkowaileet
+1


*Regards,*
Wail Alkowaileet


On Thu, Feb 29, 2024 at 15:25 Ian Maxon  wrote:

> Hi everyone,
>
> Please verify and vote on the latest release of Apache AsterixDB.
>
> The change that produced this release is up on Gerrit:
>
> https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/18186
>
> The release artifacts are as follows:
>
> AsterixDB Source
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.9-source-release.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.9-source-release.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.9-source-release.zip.sha512
>
> SHA512:
>
> f192cae6fbf6f88ba39366ee53a3e7fc23d9661b9b041d524df43fba3fafd244bfa932087119175e4dde65cca78e01f27f78ca31f959d562059485e8faa2c94b
>
> Hyracks Source
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.9-source-release.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.9-source-release.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.9-source-release.zip.sha512
>
> SHA512:
>
> 48d8bbc2757388af50c8c0e9de4a84c383e3a208e0b8d8336f021c662283b8c1078d0f8254dc5c3547d21c95fd7416de2efb26c5c2e93b69a7eebba58084446b
>
>
> AsterixDB NCService Installer:
>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.9-binary-assembly.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.9-binary-assembly.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.9-binary-assembly.zip.sha512
>
> SHA512:
>
> fadd92650d312bb9a5c90f9d654d0de1ca2c87dd29aa0d3a93bc2f0f5ba802979108d4974620343473b6c3a5fd6289c7696f07128900e2fe5609d086889bcdc6
>
>
> The KEYS file containing the PGP keys used to sign the release can be
> found at
>
> https://dist.apache.org/repos/dist/release/asterixdb/KEYS
>
> RAT was executed as part of Maven via the RAT maven plugin, but
> excludes files that are:
>
> - data for tests
> - procedurally generated,
> - or source files which come without a header mentioning their license,
>  but have an explicit reference in the LICENSE file.
>
>
> The vote is open for 72 hours, or until the necessary number of votes
> (3 +1) has been reached.
>
> Please vote
> [ ] +1 release these packages as Apache AsterixDB 0.9.9 and
> Apache Hyracks 0.3.9
> [ ] 0 No strong feeling either way
> [ ] -1 do not release one or both packages because ...
>


Re: [VOTE][APE] Compute-Storage Separation (Cloud Mode Deployment)

2023-12-02 Thread Wail Alkowaileet
+1

On Sat, Dec 2, 2023 at 11:25 Till Westmann  wrote:

> +1
>
> > On Dec 2, 2023, at 11:23, Glenn Justo Galvizo  wrote:
> >
> > +1 from me as well.
> >
> >> On Dec 2, 2023, at 10:27, Ian Maxon  wrote:
> >>
> >> +1
> >>
>  On Dec 1, 2023 at 12:28:23, Murtadha Al-Hubail 
> wrote:
> >>>
> >>> Each AsterixDB cluster today consists of one or more Node Controllers
> (NC)
> >>> where the data is stored and processed. Each NC has a predefined set of
> >>> storage partitions (iodevices). When data is ingested into the system,
> the
> >>> data is hash-partitioned across the total number of storage partitions
> in
> >>> the cluster. Similarly, when the data is queried, each NC will start as
> >>> many threads as the number storage partitions it has to read and
> process
> >>> the data in parallel. While this shared-nothing architecture has its
> >>> advantages, it has its drawbacks too. One major drawback is the time
> needed
> >>> to scale the cluster. Adding a new NC to an existing cluster of (n)
> nodes
> >>> means writing a completely new copy of the data which will now be
> >>> hash-partitioned to the new total number of storage partitions of (n +
> 1)
> >>> nodes. This operation could potentially take several hours or even days
> >>> which is unacceptable in the cloud age.
> >>>
> >>> This APE is about adding a new deployment (cloud) mode to AsterixDB by
> >>> implementing compute-storage separation to take advantage of the
> elasticity
> >>> of the cloud. This will require the following:
> >>>
> >>> 1. Moving from the dynamic data partitioning described earlier to a
> static
> >>> data partitioning based on a configurable, but fixed during a cluster's
> >>> life, number of storage partitions.
> >>> 2. Introducing the concept of a "compute partition" where each NC will
> have
> >>> a fixed number of compute partitions. This number could potentially be
> >>> based on the number of CPU cores it has.
> >>>
> >>> This will decouple the number of storage partitions being processed on
> an
> >>> NC from the number of its compute partitions.
> >>>
> >>> When an AsterixDB cluster is deployed using the cloud mode, we will do
> the
> >>> following:
> >>>
> >>> - The Cluster Controller will maintain a map containing the assignment
> of
> >>> storage partitions to compute partitions.
> >>> - New writes will be written to the NC's local storage and uploaded to
> an
> >>> object store (e.g. AWS S3) which will be used as a highly available
> shared
> >>> filesystem between NCs.
> >>> - On queries, each NC will start as many threads as its compute
> partitions
> >>> to process its currently assigned storage partitions.
> >>> - On scaling operations, we will simply update the assignment map and
> NCs
> >>> will lazily cache any data of newly assigned storage partitions from
> the
> >>> object store.
> >>>
> >>> Improvement tickets:
> >>> Static data partitioning:
> >>>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3144__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OAhVXrR7KC09sldpj5RPLxWAUgdr8MVlQ9bIpT5QK76KPmMlxnjFGChosdZpBbe81Z_KZI7COEEXdi5a$
> >>> Compute-Storage Separation
> >>>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3196__;!!CzAuKJ42GuquVTTmVmPViYEvSg!OAhVXrR7KC09sldpj5RPLxWAUgdr8MVlQ9bIpT5QK76KPmMlxnjFGChosdZpBbe81Z_KZI7COGLN6MWp$
> >>>
> >>> Please vote on this APE. We'll keep this open for 72 hours and pass
> with
> >>> either 3 votes or a majority of positive votes.
> >>>
>


Re: [VOTE] Release Apache AsterixDB JDBC Connector 0.9.8.2 (RC0)

2023-11-22 Thread Wail Alkowaileet
+1

On Wed, Nov 22, 2023 at 8:53 AM Murtadha Hubail  wrote:

> +1
> 
> From: Peeyush Gupta 
> Sent: Wednesday, November 22, 2023 7:50:18 PM
> To: dev@asterixdb.apache.org 
> Subject: Re: [VOTE] Release Apache AsterixDB JDBC Connector 0.9.8.2 (RC0)
>
> +1
>
> On 2023/11/20 23:30:51 Ian Maxon wrote:
> > Hi everyone,
> >
> > Please verify and vote on the latest release of the AsterixDB JDBC
> > connector.
> >
> > This release includes compatibility with the new Database keyword and
> level.
> >
> > The change that produced this release is up on Gerrit:
> >
> > https://asterix-gerrit.ics.uci.edu/c/asterixdb-clients/+/17973
> >
> > The release artifacts are as follows:
> >
> > JDBC Driver Source
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.8.2-source-release.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.8.2-source-release.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.8.2-source-release.zip.sha512
> >
> > SHA512:
> >
> 4ab33e535d5189d229d6f73082a73ffda00215a19a4ffbd0f37e4185c1917427682b68b8bc34bf06860b77c61a9b9efcdeb2810402951531dbc59b562bd7b9ad
> >
> > JDBC Driver Distributable Jar
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.8.2-dist.jar
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.8.2-dist.jar.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.8.2-dist.jar.sha512
> >
> > SHA512:
> >
> ff1bb6783ec2e4c58d5d603933cd74cdf5c2757d2677ac9f752cab048b6d8e46203a854e19fc2c90c6873e03e880b6e0113fce02dfbf89659aa93966e7a29eb9
> >
> > Tableau Connector (TACO file)
> > https://dist.apache.org/repos/dist/dev/asterixdb/asterixdb_jdbc.taco
> > https://dist.apache.org/repos/dist/dev/asterixdb/asterixdb_jdbc.taco.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterixdb_jdbc.taco.sha512
> >
> > SHA512:
> >
> d0e205049962ad1b79ed259d0e513632f440c2c6f46a4be2c733cec0d4bc4c86b12e43c650fe7639fb5621df80628de57fcccb4534acad0b680124e167a009a8
> >
> >
> > The KEYS file containing the PGP keys used to sign the release can be
> > found at
> >
> > https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> >
> > RAT was executed as part of Maven via the RAT maven plugin, but
> > excludes files that are:
> >
> > - data for tests
> > - procedurally generated,
> > - or source files which come without a header mentioning their license,
> >  but have an explicit reference in the LICENSE file.
> >
> >
> > The vote is open for 72 hours, or until the necessary number of votes
> > (3 +1) has been reached.
> >
> > Please vote
> > [ ] +1 release these packages as Apache AsterixDB JDBC Driver 0.9.8.2
> > [ ] 0 No strong feeling either way
> > [ ] -1 do not release one or both packages because ...
> >
> > Thanks!
> >
>


-- 

*Regards,*
Wail Alkowaileet


Re: [APE] Support COPY TO

2023-11-01 Thread Wail Alkowaileet
AsterixDB now supports COPY TO.

On Thu, Oct 26, 2023 at 10:05 AM Till Westmann  wrote:

> Agreed (+1) - this functionality will significantly reduce the pain of
> moving data in and out of object storage.
>
> > On Oct 25, 2023, at 10:04 PM, Mike Carey  wrote:
> >
> > +1  -- Now maybe users will stop trying to retrieve huge results and
> wondering why the UI is choking! :-) This capability is actually long
> overdue.
> >
> > On 10/24/23 9:53 AM, Wail Alkowaileet wrote:
> >> Currently, AsterixDB does not have a clean way to extract query results
> or
> >> dump a dataset to a storage device. The only channel provided currently
> is
> >> the Query Service (i.e., running the query and writing it somehow at the
> >> client side). We need to support a way to write query results (or dump a
> >> dataset) in parallel to a storage device.
> >>
> >> To illustrate, say we want to do the following:
> >>
> >>> USE CopyToDataverse;
> >> COPY ColumnDataset
> >>> TO localfs
> >>> PATH("localhost:///media/backup/CopyToResult")
> >>> WITH {
> >>> "format" : "json"
> >>> };
> >> In this example, the data in ColumnDataset will be written in each node
> at
> >> the provided path localhost:///media/backup/CopyToResult. Simply, each
> node
> >> will write its own partitions for the data stored in ColumnDataset
> locally.
> >> The written files will be in raw JSON format.
> >>
> >> Another example:
> >>
> >>> USE CopyToDataverse;
> >>> COPY (SELECT cd.uid uid,
> >>> cd.sensor_info.name name,
> >>> to_bigint(cd.sensor_info.battery_status)
> >>> battery_status
> >>>  FROM ColumnDataset cd
> >>> ) toWrite
> >>> TO s3
> >>> PATH("CopyToResult/" || to_string(b))
> >>> OVER (
> >>>PARTITION BY toWrite.battery_status b
> >>>ORDER BY toWrite.name
> >>> )
> >>> WITH {
> >>> "format" : "json",
> >>> "compression": "gzip",
> >>> "max-objects-per-file": 100,
> >>> "container": "myBucket",
> >>> "accessKeyId": "",
> >>> "secretAccessKey": "",
> >>> "region": "us-west-2"
> >>> };
> >> The second example shows how to write the result of a query and also
> >> partition the result so that each partition will be written to a certain
> >> path. In this example, we partition by the battery_status (say an
> integer
> >> value from 0 to 100). The final result will be written to myBucke in
> Amazon
> >> S3.
> >>
> >> Each partition will have the path CopyToResult/. For
> >> example CopyToResult/0, CopyToResult/1 ..., CopyToResult/99,
> >> CopyToResult/100). This partitioning scheme can be useful if a user
> wants
> >> to exploit dynamic prefixes (external filters) (see ASTERIXDB-3073
> >> <https://issues.apache.org/jira/browse/ASTERIXDB-3073>).
> >>
> >> Additionally, the records in each partition will be ordered by the
> >> sensor_name (toWrite.name). Note that this ordering isn't global but per
> >> partition.
> >>
> >> Also, the written files will be compressed using *gzip* and each file
> >> should have at most 100 records max (*max-objects-per-file*).
> >>
> >> EPIC: ASTERIXDB-3286<
> https://issues.apache.org/jira/browse/ASTERIXDB-3286>
>
>

-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Changing COPY FROM syntax

2023-11-01 Thread Wail Alkowaileet
The change to the syntax has been merged.

On Fri, Oct 27, 2023 at 11:15 AM Taewoo Kim  wrote:

> +1
>
> interesting fact - MS has a similar syntax COPY INTO (Transact-SQL) - Azure
> Synapse Analytics and Microsoft Fabric | Microsoft Learn
> <
> https://learn.microsoft.com/en-us/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest
> >
>
> Best,
> Taewoo
>
>
> On Fri, Oct 27, 2023 at 11:11 AM Glenn Justo Galvizo 
> wrote:
>
> > +1 from me as well!
> >
> > > On Oct 27, 2023, at 10:15, Till Westmann  wrote:
> > >
> > > +1 this is much nicer
> > >
> > >> On 2023/10/26 05:05:01 Mike Carey wrote:
> > >> PS - I assume the semantics will be UPSERT-based? (Vs. one-time or
> > >> INSERT-based?)
> > >>
> > >>> On 10/24/23 10:16 AM, Wail Alkowaileet wrote:
> > >>> Hi all,
> > >>>
> > >>> I'm proposing to change the current syntax for COPY FROM. The current
> > >>> syntax looks as follows:
> > >>>
> > >>>> COPY Customers
> > >>>> USING localfs (
> > >>>>   ("path"="asterix_nc1://data/nontagged/customerData.json"),
> > >>>>   ("format"="json")
> > >>>> );
> > >>>>
> > >>> This syntax uses the old way of configuring the adapter localfs. In
> our
> > >>> feeds, we use the WITH clause. Another issue is that the current
> > syntax is
> > >>> missing the keyword FROM, which makes it ambiguous if we add support
> > for
> > >>> COPY TO.
> > >>>
> > >>> I propose to change the syntax to be as follows:
> > >>>
> > >>>> COPY Customers
> > >>>> FROM localfs
> > >>>> PATH ("asterix_nc1://data/nontagged/customerData.json")
> > >>>> WITH {
> > >>>> "format": "json"
> > >>>> };
> > >>>>
> > >>> First, the proposed syntax introduces the use of FROM .
> > >>> Second, it mandates the use of PATH (instead of having it in the WITH
> > >>> clause). Additionally, the proposed syntax will make both COPY FROM
> and
> > >>> COPY TO less different.
> > >>>
> > >>> Example of COPY TO:
> > >>>
> > >>>> COPY Customers
> > >>>> TO localfs
> > >>>> PATH("localhost:///myData/Customers")
> > >>>> WITH {
> > >>>> "format" : "json"
> > >>>> };
> > >>>>
> >
>


-- 

*Regards,*
Wail Alkowaileet


[VOTE] Changing COPY FROM syntax

2023-10-24 Thread Wail Alkowaileet
Hi all,

I'm proposing to change the current syntax for COPY FROM. The current
syntax looks as follows:

> COPY Customers
> USING localfs (
>   ("path"="asterix_nc1://data/nontagged/customerData.json"),
>   ("format"="json")
> );
>

This syntax uses the old way of configuring the adapter localfs. In our
feeds, we use the WITH clause. Another issue is that the current syntax is
missing the keyword FROM, which makes it ambiguous if we add support for
COPY TO.

I propose to change the syntax to be as follows:

> COPY Customers
> FROM localfs
> PATH ("asterix_nc1://data/nontagged/customerData.json")
> WITH {
> "format": "json"
> };
>

First, the proposed syntax introduces the use of FROM .
Second, it mandates the use of PATH (instead of having it in the WITH
clause). Additionally, the proposed syntax will make both COPY FROM and
COPY TO less different.

Example of COPY TO:

> COPY Customers
> TO localfs
> PATH("localhost:///myData/Customers")
> WITH {
> "format" : "json"
> };
>
-- 

*Regards,*
Wail Alkowaileet


[APE] Support COPY TO

2023-10-24 Thread Wail Alkowaileet
Currently, AsterixDB does not have a clean way to extract query results or
dump a dataset to a storage device. The only channel provided currently is
the Query Service (i.e., running the query and writing it somehow at the
client side). We need to support a way to write query results (or dump a
dataset) in parallel to a storage device.

To illustrate, say we want to do the following:

> USE CopyToDataverse;

COPY ColumnDataset
> TO localfs
> PATH("localhost:///media/backup/CopyToResult")
> WITH {
> "format" : "json"
> };

In this example, the data in ColumnDataset will be written in each node at
the provided path localhost:///media/backup/CopyToResult. Simply, each node
will write its own partitions for the data stored in ColumnDataset locally.
The written files will be in raw JSON format.

Another example:

> USE CopyToDataverse;
> COPY (SELECT cd.uid uid,
> cd.sensor_info.name name,
> to_bigint(cd.sensor_info.battery_status)
> battery_status
>  FROM ColumnDataset cd
> ) toWrite
> TO s3
> PATH("CopyToResult/" || to_string(b))
> OVER (
>PARTITION BY toWrite.battery_status b
>ORDER BY toWrite.name
> )
> WITH {
> "format" : "json",
> "compression": "gzip",
> "max-objects-per-file": 100,
> "container": "myBucket",
> "accessKeyId": "",
> "secretAccessKey": "",
> "region": "us-west-2"
> };

The second example shows how to write the result of a query and also
partition the result so that each partition will be written to a certain
path. In this example, we partition by the battery_status (say an integer
value from 0 to 100). The final result will be written to myBucke in Amazon
S3.

Each partition will have the path CopyToResult/. For
example CopyToResult/0, CopyToResult/1 ..., CopyToResult/99,
CopyToResult/100). This partitioning scheme can be useful if a user wants
to exploit dynamic prefixes (external filters) (see ASTERIXDB-3073
<https://issues.apache.org/jira/browse/ASTERIXDB-3073>).

Additionally, the records in each partition will be ordered by the
sensor_name (toWrite.name). Note that this ordering isn't global but per
partition.

Also, the written files will be compressed using *gzip* and each file
should have at most 100 records max (*max-objects-per-file*).

EPIC: ASTERIXDB-3286 <https://issues.apache.org/jira/browse/ASTERIXDB-3286>
-- 

*Regards,*
Wail Alkowaileet


Re: Release Apache AsterixDB JDBC Connector 0.9.8.1

2023-09-11 Thread Wail Alkowaileet
+1

On Mon, Sep 11, 2023 at 10:18 AM Murtadha Hubail 
wrote:

> +1
>
> Cheers,
> Murtadha
> 
> From: Peeyush Gupta 
> Sent: Monday, September 11, 2023 7:57:31 PM
> To: dev@asterixdb.apache.org 
> Subject: Re: Release Apache AsterixDB JDBC Connector 0.9.8.1
>
> +1
>
> - Checked hashes
> - Compiled from source code
>
> Thanks,
> Peeyush
>
> On 2023/09/08 21:41:08 Ian Maxon wrote:
> > Hi everyone,
> >
> > Please verify and vote on the latest release of the AsterixDB JDBC
> > connector.
> >
> > This release includes a commit unintentionally missing from release 0.9.8
> >
> > The change that produced this release is up on Gerrit:
> >
> > https://asterix-gerrit.ics.uci.edu/c/asterixdb-clients/+/17768
> >
> > The release artifacts are as follows:
> >
> > JDBC Driver Source
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.8.1-source-release.zip
> > <
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.7.1-source-release.zip
> >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.8.1-source-release.zip.asc
> > <
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.7.1-source-release.zip.asc
> >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.8.1-source-release.zip.sha512
> > <
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-jdbc-0.9.7.1-source-release.zip.sha512
> >
> >
> > SHA512:
> >
> 4b8d90a06210f367e84a629c61353e4c2ab32f756da80d5228cd9774f3351e17a291a4de0d77051ee63042275a5d4e7425def1565422dfec30bec7829b614fee
> >
> > JDBC Driver Distributable Jar
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.8.1-dist.jar
> > <
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.7.1-dist.jar
> >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.8.1-dist.jar.asc
> > <
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.7.1-dist.jar.asc
> >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.8.1-dist.jar.sha512
> > <
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-jdbc-driver-0.9.7.1-dist.jar.sha512
> >
> >
> > SHA512:
> >
> 71d9ca50db58c84706c54ac34d841c1cd205f8c7987ef148dc6e25260e7d1609369e3448d3ac9d3c143b6860faab01990ac634582dad428b831825e2f354f4d1
> >
> > Tableau Connector (TACO file)
> > https://dist.apache.org/repos/dist/dev/asterixdb/asterixdb_jdbc.taco
> > https://dist.apache.org/repos/dist/dev/asterixdb/asterixdb_jdbc.taco.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterixdb_jdbc.taco.sha512
> >
> > SHA512:
> >
> 4d6dd17dbcc1cb6aa5557feb255156ca34bfe4b32fe307be4bd6694bb0efe46e4112ae58d2dcff723ed7d1bd7d666fb9e1b2f53e1c4a36b5fabb32f53078ad67
> >
> >
> > The KEYS file containing the PGP keys used to sign the release can be
> > found at
> >
> > https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> >
> > RAT was executed as part of Maven via the RAT maven plugin, but
> > excludes files that are:
> >
> > - data for tests
> > - procedurally generated,
> > - or source files which come without a header mentioning their license,
> >  but have an explicit reference in the LICENSE file.
> >
> >
> > The vote is open for 72 hours, or until the necessary number of votes
> > (3 +1) has been reached.
> >
> > Please vote
> > [ ] +1 release these packages as Apache AsterixDB JDBC Driver 0.9.8.1
> > [ ] 0 No strong feeling either way
> > [ ] -1 do not release one or both packages because ...
> >
> > Thanks!
> >
>


-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Release Apache AsterixDB 0.9.7.1 and Hyracks 0.9.7.1 (RC0)

2021-12-11 Thread Wail Alkowaileet
+1 (non-binding)

Verified:
- Hashes
- Signatures
- Build from source
- Ran a sample cluster:

Ingest and query a "sensors" records


On Sat, Dec 11, 2021 at 6:14 PM Till Westmann  wrote:

>
> +1
>
> Checked the Hyracks and AsterixDB source artifacts:
> - Signatures and hashes correct
> — LICENSE and NOTICE look ok
> - Source files have Apache header
> - No unexpected binary files (checked expected binary files)
> - Can compile from source
>
> Checked the AsterixDB Server binary artifact:
> - Signatures and hashes correct
> — LICENSE and NOTICE look ok
>
> Till
>
> On 11 Dec 2021, at 16:20, Michael Blow wrote:
>
> > The log4j-1.2-api is still at an older version (2.13.1), given that
> > our
> > tests are passing there likely isn't any compatibility
> > issue, but if we end up respinning for some other reason, we might
> > want to
> > consider advancing log4j-1.2-api to 2.15.0.
> >
> > Verified:
> >
> >- source builds
> >- signatures
> >- checksums
> >
> >
> > [X] +1 release these packages as Apache AsterixDB 0.9.7.1 and
> > Apache Hyracks 0.3.7.1
> > [ ] 0 No strong feeling either way
> > [ ] -1 do not release one or both packages because ...
> >
> > On Sat, Dec 11, 2021 at 6:26 PM Ian Maxon  wrote:
> >
> >> Hi everyone,
> >>
> >> Please verify and vote on the latest release of Apache AsterixDB.
> >> This
> >> release is purely a maintenance release to 0.9.7 and contains minimal
> >> changes.
> >>
> >> The change that produced this release is up on Gerrit:
> >>
> >> https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14504
> >>
> >> The release artifacts are as follows:
> >>
> >> AsterixDB Source
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.7.1-source-release.zip
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.7.1-source-release.zip.asc
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.7.1-source-release.zip.sha256
> >>
> >> SHA256:
> >> a698f6246347592263858af349de206d2636984d040d27bb82770a2a5c6bc0b4
> >>
> >>
> >> Hyracks Source
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.7.1-source-release.zip
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.7.1-source-release.zip.asc
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.7.1-source-release.zip.sha256
> >>
> >> SHA256:
> >> d3b4652aabfee134ea28a92c6fbe5e5ea9091aa623c5ec68d91b0eb5ece755e5
> >>
> >>
> >> AsterixDB NCService Installer:
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.7.1-binary-assembly.zip
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.7.1-binary-assembly.zip.asc
> >>
> >>
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.7.1-binary-assembly.zip.sha256
> >>
> >> SHA256:
> >> d78c6725eed3386a63ca801e4caa5595bdf2b91c8cf993490e3e429ffc42c163
> >>
> >>
> >> The KEYS file containing the PGP keys used to sign the release can be
> >> found at
> >>
> >> https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> >>
> >> RAT was executed as part of Maven via the RAT maven plugin, but
> >> excludes files that are:
> >>
> >> - data for tests
> >> - procedurally generated,
> >> - or source files which come without a header mentioning their
> >> license,
> >>   but have an explicit reference in the LICENSE file.
> >>
> >>
> >> The vote is open for 72 hours, or until the necessary number of votes
> >> (3 +1) has been reached.
> >>
> >> Please vote
> >> [ ] +1 release these packages as Apache AsterixDB 0.9.7.1 and
> >> Apache Hyracks 0.3.7.1
> >> [ ] 0 No strong feeling either way
> >> [ ] -1 do not release one or both packages because ...
> >>
> >> Thanks!
> >>
>


-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Accept donation of AsterixDB JDBC Driver

2021-08-23 Thread Wail Alkowaileet
+1

On Sun, Aug 22, 2021 at 23:51 Till Westmann  wrote:

> Hi,
>
> Couchbase would like to donate a JDBC driver for AsterixDB to the Apache
> Software foundation.
>
> [ ] +1 accept the donation and add the driver to the AsterixDB code base
> [ ] +0 no opinion
> [ ] -1 do not accept the donation because...
>
> The vote will be open for 7 days.
>
> Please vote,
> Till
>
-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Release Apache AsterixDB 0.9.7 and Hyracks 0.3.7 (RC1)

2021-06-15 Thread Wail Alkowaileet
+1

- Signatures and hashes ok (verified with Ian as the older KEYS were
*expired*)
- Source compilation works.
- Deployed cluster using Ansible and executed a few queries against Parquet
files.



On Thu, Jun 10, 2021 at 3:50 PM Ian Maxon  wrote:

> A small correction, the change is actually at:
>
> https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/11903
>
> On Thu, Jun 10, 2021 at 3:48 PM Ian Maxon  wrote:
> >
> > Hi everyone,
> >
> > Please verify and vote on the latest release of Apache AsterixDB
> >
> > The change that produced this release is up for review on Gerrit:
> >
> > https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/11824
> >
> > The release artifacts are as follows:
> >
> > AsterixDB Source
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.7-source-release.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.7-source-release.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.7-source-release.zip.sha256
> >
> > SHA256: fdcb0396ed9106203656b07203fffa44e34003565f8a55ee982ee89c0b9b0a5b
> >
> >
> > Hyracks Source
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.7-source-release.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.7-source-release.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.7-source-release.zip.sha256
> >
> > SHA256: f4ca4e806f41baa68275d2be40a414066374830b794c9bf811dcebfeefea3587
> >
> >
> > AsterixDB NCService Installer:
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.7-binary-assembly.zip
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.7-binary-assembly.zip.asc
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.7-binary-assembly.zip.sha256
> >
> > SHA256: fb38aed1daaca7faa2aeb8b45ff7ab227e11ca1e92ab4995fb973fefcc794611
> >
> >
> > The KEYS file containing the PGP keys used to sign the release can be
> > found at
> >
> > https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> >
> > RAT was executed as part of Maven via the RAT maven plugin, but
> > excludes files that are:
> >
> > - data for tests
> > - procedurally generated,
> > - or source files which come without a header mentioning their license,
> >   but have an explicit reference in the LICENSE file.
> >
> >
> > The vote is open for 72 hours, or until the necessary number of votes
> > (3 +1) has been reached.
> >
> > Please vote
> > [ ] +1 release these packages as Apache AsterixDB 0.9.7 and
> > Apache Hyracks 0.3.7
> > [ ] 0 No strong feeling either way
> > [ ] -1 do not release one or both packages because ...
> >
> > Thanks!
>


-- 

*Regards,*
Wail Alkowaileet


Re: Parquet binary files in the code-base

2020-09-23 Thread Wail Alkowaileet
They are used as input files for external datasets

On Wed, Sep 23, 2020, 09:24 Wail Alkowaileet  wrote:

> They are used for the integration test.
>
> On Tue, Sep 22, 2020, 22:44 Till Westmann  wrote:
>
>> Hi Wail,
>>
>> Could you provide a bit of context how those binary files are used?
>>
>> Cheers,
>> Till
>>
>> On 23 Sep 2020, at 0:24, Ian Maxon wrote:
>>
>> > I think generally we try to avoid it where possible but it's hard to
>> > sometimes. IMO, as long as the file is small and changes very rarely,
>> > it's fine.
>> > If it's not maybe there's some way to generate it on the fly from a
>> > textual representation?
>> >
>> > On Tue, Sep 22, 2020 at 2:35 PM Wail Alkowaileet 
>> > wrote:
>> >>
>> >> Devs,
>> >>
>> >> Is it ok to have binary files (parquet files) as part of the
>> >> code-base?
>> >>
>> >> --
>> >>
>> >> *Regards,*
>> >> Wail Alkowaileet
>>
>


Re: Parquet binary files in the code-base

2020-09-23 Thread Wail Alkowaileet
They are used for the integration test.

On Tue, Sep 22, 2020, 22:44 Till Westmann  wrote:

> Hi Wail,
>
> Could you provide a bit of context how those binary files are used?
>
> Cheers,
> Till
>
> On 23 Sep 2020, at 0:24, Ian Maxon wrote:
>
> > I think generally we try to avoid it where possible but it's hard to
> > sometimes. IMO, as long as the file is small and changes very rarely,
> > it's fine.
> > If it's not maybe there's some way to generate it on the fly from a
> > textual representation?
> >
> > On Tue, Sep 22, 2020 at 2:35 PM Wail Alkowaileet 
> > wrote:
> >>
> >> Devs,
> >>
> >> Is it ok to have binary files (parquet files) as part of the
> >> code-base?
> >>
> >> --
> >>
> >> *Regards,*
> >> Wail Alkowaileet
>


Parquet binary files in the code-base

2020-09-22 Thread Wail Alkowaileet
Devs,

Is it ok to have binary files (parquet files) as part of the code-base?

-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Release Apache AsterixDB 0.9.5 and Hyracks 0.3.5 (RC4)

2020-07-09 Thread Wail Alkowaileet
[ X ] +1 release these packages as Apache AsterixDB 0.9.5 and
Apache Hyracks 0.3.5

- Verified signatures and hashes
- Verified source build with Java 11
- Ingested and query a few tweets and everything seems to be working
correctly.

On Wed, Jul 8, 2020 at 7:18 PM Taewoo Kim  wrote:

> [ X] +1 release these packages as Apache AsterixDB 0.9.5 and Apache Hyracks
> 0.3.5
>
> Followed the directions on the following page.
> https://cwiki.apache.org/confluence/display/ASTERIXDB/Release+Verification
>
> [v] Verify signatures and hashes
> [v] Verify that source builds correctly
> [v] Smoke test
>
> Best,
> Taewoo
>
>
> On Wed, Jul 8, 2020 at 6:04 AM Michael Blow 
> wrote:
>
> > [ X ] +1 release these packages as Apache AsterixDB 0.9.5 and
> > Apache Hyracks 0.3.5
> >
> > Checked:
> > - keys, signatures on all packages
> > - SHAs
> > - sanity check of LICENSE / NOTICEs
> > - functional build of source packages
> > - all versions advanced from SNAPSHOT
> >
> >
> > On Mon, Jul 6, 2020 at 6:51 PM Ian Maxon  wrote:
> >
> > > Hi everyone,
> > >
> > > Please verify and vote on the latest release of Apache AsterixDB
> > >
> > > The change that produced this release is up for review on Gerrit:
> > >
> > > https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/7124
> > >
> > > The release artifacts are as follows:
> > >
> > > AsterixDB Source
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip.asc
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip.sha256
> > >
> > > SHA256:09affe9ce5aa75add6c5a75c51505e619f85cb7a87eb3b9d977ac472d5387bd1
> > >
> > > Hyracks Source
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip.asc
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip.sha256
> > >
> > > SHA256:577d2b3da91ebfa37c113bae18561dcbfae0bdd526edee604b747f6044f4a03b
> > >
> > > AsterixDB NCService Installer:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.5-binary-assembly.zip
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.5-binary-assembly.zip.asc
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-server-0.9.5-binary-assembly.zip.sha256
> > >
> > > SHA256:6854e71fc78f9cfb68b0dc3c61edb5f5c94b09b41f4a8deaf4c2fc9d804abcac
> > >
> > > The KEYS file containing the PGP keys used to sign the release can be
> > > found at
> > >
> > > https://dist.apache.org/repos/dist/release/asterixdb/KEYS
> > >
> > > RAT was executed as part of Maven via the RAT maven plugin, but
> > > excludes files that are:
> > >
> > > - data for tests
> > > - procedurally generated,
> > > - or source files which come without a header mentioning their license,
> > >   but have an explicit reference in the LICENSE file.
> > >
> > >
> > > The vote is open for 72 hours, or until the necessary number of votes
> > > (3 +1) has been reached.
> > >
> > > Please vote
> > > [ ] +1 release these packages as Apache AsterixDB 0.9.5 and
> > > Apache Hyracks 0.3.5
> > > [ ] 0 No strong feeling either way
> > > [ ] -1 do not release one or both packages because ...
> > >
> > > Thanks!
> > >
> >
>


-- 

*Regards,*
Wail Alkowaileet


Re: Subplan: ExtractCommonExporessions

2019-10-19 Thread Wail Alkowaileet
Hi Taewoo,

I might be got confused by the comment. But still did not get why it
is commented?
Sub-plans are missing a few optimizations. For example after
uncommenting this code and fixing InlineSingleReferenceVariableRule
for sub-plans we get:

Before:


subplan {
  aggregate [$$10] <- [listify($$9)]
  -- AGGREGATE  |LOCAL|
assign [$$9] <- [eq($$22, "Tom")]
-- ASSIGN  |LOCAL|
  assign [$$22] <- [$$30.getField("firstName")]
  -- ASSIGN  |LOCAL|
assign [$$30] <- [$$29.getField("name")]
-- ASSIGN  |LOCAL|
  assign [$$29] <- [$$x.getField("names")]
  -- ASSIGN  |LOCAL|
select (eq($$21, "1"))
-- STREAM_SELECT  |LOCAL|
  nested tuple source
  -- NESTED_TUPLE_SOURCE  |LOCAL|
   }
-- SUBPLAN  |PARTITIONED|
  project ([$$x, $$21, $$25])
  -- STREAM_PROJECT  |PARTITIONED|
assign [$$21, $$25] <-
[$$28.getField("count"), $$28.getField("name")]
-- ASSIGN  |PARTITIONED|
  assign [$$28] <- [$$x.getField("names")]
  -- ASSIGN  |PARTITIONED|


After:


subplan {
  aggregate [$$10] <- [listify($$9)]
  -- AGGREGATE  |LOCAL|
assign [$$9] <-
[eq($$26.getField("firstName"), "Tom")]
-- ASSIGN  |LOCAL|
  select (eq($$22, "1"))
  -- STREAM_SELECT  |LOCAL|
nested tuple source
-- NESTED_TUPLE_SOURCE  |LOCAL|
   }
-- SUBPLAN  |PARTITIONED|
  project ([$$x, $$22, $$26])
  -- STREAM_PROJECT  |PARTITIONED|
assign [$$26, $$22] <- [$$20.getField("name"),
$$20.getField("count")]
-- ASSIGN  |PARTITIONED|
  assign [$$20] <- [$$x.getField("names")]
  -- ASSIGN  |PARTITIONED|

On Sat,
Oct 19, 2019 at 11:15 AM Taewoo Kim wangs...@gmail.com
wrote:Hi Wail,

I think that's what the comment implies (I could not produce an expression
where it used in a sub-plan and not visible to upper operators?). If you
want to make it happen, a workaround might be introducing a project
operator within the subplan of the subplan operator? Actually, if a
variable is not used, isn't project operator supposed to remove them
automatically by IntroduceProjectsRule?

Best,
Taewoo


On Sat, Oct 19, 2019 at 10:05 AM Wail Alkowaileet mailto:wael@gmail.com;
target="_blank">wael@gmail.com
wrote:

 Hi Dev,

 I'm not sure about the commented code in [1]. I could not produce an
 expression where it used in a sub-plan and not visible to upper
operators?
 Also, all SQL++ runtime integration tests seem to work just fine.


 [1] ExtractCommonExpressionsRule.java#L192
 
 https://github.com/apache/asterixdb/blob/f2c18aa9646238ab2487ce3a964edfe3e61dd6e1/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/ExtractCommonExpressionsRule.java#L192;
rel="noreferrer"
target="_blank">https://github.com/apache/asterixdb/blob/f2c18aa9646238ab2487ce3a964edfe3e61dd6e1/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/ExtractCommonExpressionsRule.java#L192
 

 --

 *Regards,*
 Wail Alkowaileet

-- Regards,Wail
Alkowaileet


Subplan: ExtractCommonExporessions

2019-10-19 Thread Wail Alkowaileet
Hi Dev,

I'm not sure about the commented code in [1]. I could not produce an
expression where it used in a sub-plan and not visible to upper operators?
Also, all SQL++ runtime integration tests seem to work just fine.


[1] ExtractCommonExpressionsRule.java#L192
<https://github.com/apache/asterixdb/blob/f2c18aa9646238ab2487ce3a964edfe3e61dd6e1/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/ExtractCommonExpressionsRule.java#L192>

-- 

*Regards,*
Wail Alkowaileet


Re: HEADS UP: storage block compression to be enabled by default

2019-10-12 Thread Wail Alkowaileet
I think that works.

On Fri, Oct 11, 2019 at 11:46 AM Till Westmann  wrote:

> Good question.
> We could also consider to change the test to run uncompressed to
> maintain some test coverage for the uncompressed case.
>
> Thoughts?
>
> Cheers,
> Till
>
> On 11 Oct 2019, at 9:48, Wail Alkowaileet wrote:
>
> > Should we remove [1] and [2]? [1] is now similar to [3] except for the
> > buffer cache size.
> >
> > [1]
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/cc-compression.conf
> > [2]
> >
> https://github.com/apache/asterixdb/blob/5aeba9b475fc714edf953bd88ada7281a2d4937e/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/SqlppExecutionWithCompresisionTest.java
> > [3]
> >
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/cc.conf
> >
> > On Fri, Oct 11, 2019 at 5:51 AM Michael Blow 
> > wrote:
> >
> >> All,
> >>
> >> The default storage block level compression strategy is changing from
> >> 'none' to 'snappy'.
> >>
> >> Existing datasets will not be affected, but if you want to prevent
> >> new
> >> datasets from being compressed, either specify the compression should
> >> be
> >> none at dataset creation time:
> >>
> >>
> >>
> >>
> >> *create dataset DBLP1(DBLPType)primary key idwith
> >> {"storage-block-compression": {"scheme": "none"}};*
> >>
> >> ...or override the default in the config file to be none:
> >>
> >>
> >> *storage.compression.block = none*
> >>
> >>
> >> I expect this to be merged into master today.
> >>
> >> Thanks,
> >>
> >> -MDB
> >>
> >
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
>


-- 

*Regards,*
Wail Alkowaileet


Re: HEADS UP: storage block compression to be enabled by default

2019-10-11 Thread Wail Alkowaileet
Should we remove [1] and [2]? [1] is now similar to [3] except for the
buffer cache size.

[1]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/cc-compression.conf
[2]
https://github.com/apache/asterixdb/blob/5aeba9b475fc714edf953bd88ada7281a2d4937e/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/SqlppExecutionWithCompresisionTest.java
[3]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/resources/cc.conf

On Fri, Oct 11, 2019 at 5:51 AM Michael Blow  wrote:

> All,
>
> The default storage block level compression strategy is changing from
> 'none' to 'snappy'.
>
> Existing datasets will not be affected, but if you want to prevent new
> datasets from being compressed, either specify the compression should be
> none at dataset creation time:
>
>
>
>
> *create dataset DBLP1(DBLPType)primary key idwith
> {"storage-block-compression": {"scheme": "none"}};*
>
> ...or override the default in the config file to be none:
>
>
> *storage.compression.block = none*
>
>
> I expect this to be merged into master today.
>
> Thanks,
>
> -MDB
>


-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Release Apache AsterixDB 0.9.5 and Hyracks 0.3.5 (RC3)

2019-09-12 Thread Wail Alkowaileet
+1

- Signatures and hashes ok.
- NCService binary works.
- Source compilation works.
- Executed the sample cluster. Ingested tweets and run few queries.

On Tue, Sep 3, 2019 at 6:02 PM Ian Maxon  wrote:

> Hi everyone,
>
> Please verify and vote on the latest release of Apache AsterixDB. This
> candidate fixes the binary name and missing Netty notice from RC2.
>
> The change that produced this release and the change to advance the
> version are
> up for review on Gerrit:
>
>
> https://asterix-gerrit.ics.uci.edu/#/q/status:open+owner:%22Jenkins+%253Cjenkins%2540fulliautomatix.ics.uci.edu%253E%22
>
> The release artifacts are as follows:
>
> AsterixDB Source
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5-source-release.zip.sha256
>
> SHA256:be41051e803e5ada2c64f608614c6476c6686e043c47a2a0291ccfd25239a679
>
> Hyracks Source
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyracks-0.3.5-source-release.zip.sha256
>
> SHA256:b06fe983aa6837abe3460a157d7600662ec56181a43db317579f5c7ddf9bfc08
>
> AsterixDB NCService Installer:
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5.zip
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5.zip.asc
>
> https://dist.apache.org/repos/dist/dev/asterixdb/apache-asterixdb-0.9.5.zip.sha256
>
>
> SHA256:
>
> The KEYS file containing the PGP keys used to sign the release can be
> found at
>
> https://dist.apache.org/repos/dist/release/asterixdb/KEYS
>
> RAT was executed as part of Maven via the RAT maven plugin, but
> excludes files that are:
>
> - data for tests
> - procedurally generated,
> - or source files which come without a header mentioning their license,
>   but have an explicit reference in the LICENSE file.
>
>
> The vote is open for 72 hours, or until the necessary number of votes
> (3 +1) has been reached.
>
> Please vote
> [ ] +1 release these packages as Apache AsterixDB 0.9.5 and
> Apache Hyracks 0.3.5
> [ ] 0 No strong feeling either way
> [ ] -1 do not release one or both packages because ...
>
> Thanks!
>


-- 

*Regards,*
Wail Alkowaileet


Re: Generate an iteration counter

2019-08-20 Thread Wail Alkowaileet
Also function range(x, y)
<https://ci.apache.org/projects/asterixdb/sqlpp/builtins.html#MiscFunctions>
can
be helpful. Here's a function *zipWithIndex *which does exactly what you
need:

DROP FUNCTION zipWithIndex@1 IF EXISTS;
CREATE FUNCTION zipWithIndex(array_of_objects) {
SELECT object_merge(array_of_objects[i-1], {"i": i})
FROM range(1, array_length(array_of_objects)) as i
};

SELECT value zipWithIndex(x.B)
FROM root as x;

Output:

[

   1.  <http://192.168.0.100:8080/>[
  1.  <http://192.168.0.100:8080/>{
 - $1 <http://192.168.0.100:8080/>: {
- B1: "b111",
- B2: "b112",
- i: 1
 }
  },
  2.  <http://192.168.0.100:8080/>{
 - $1 <http://192.168.0.100:8080/>: {
- B1: "b121",
- B2: "b122",
- i: 2
 }
  }
   ],
   2.  <http://192.168.0.100:8080/>[
  1.  <http://192.168.0.100:8080/>{
 - $1 <http://192.168.0.100:8080/>: {
- B1: "b211",
- B2: "b212",
- i: 1
 }
  },
  2.  <http://192.168.0.100:8080/>{
 - $1 <http://192.168.0.100:8080/>: {
- B1: "b221",
- B2: "b222",
- i: 2
 }
  }
   ],
   3.  <http://192.168.0.100:8080/>[
  1.  <http://192.168.0.100:8080/>{
 - $1 <http://192.168.0.100:8080/>: {
- B1: "b311",
- B2: "b312",
- i: 1
 }
  },
  2.  <http://192.168.0.100:8080/>{
 - $1 <http://192.168.0.100:8080/>: {
- B1: "b321",
- B2: "b322",
- i: 2
 }
  }
   ]

]


On Tue, Aug 20, 2019 at 3:07 PM Michael Carey  wrote:

> See if the new window function support (see the OVER clause in the SQL++
> Language documentation and the Window Functions section of the Built-in
> Functions documentation for more info) meets your needs?  The rank or
> dense_rank functionality might do the trick  (Eventually SQL++ will
> have positional variable support, but right now it does not; that's in
> progress, as it's slightly tricky in a shared-nothing parallel setting.)
> On 8/16/19 2:53 AM, f...@legsem.com wrote:
>
> Hello everyone, apologies if this is a trivial question.
>
> I am trying to do something like this:
>
> 
> DROP DATAVERSE test IF EXISTS;
> CREATE DATAVERSE test;
> USE test;
>
> CREATE TYPE B AS {
> B1: string,
> B2: string
> };
>
> CREATE TYPE RooTType As{
> id:uuid,
> A: string,
> B:[B]
> };
>
> CREATE DATASET  root (RooTType) PRIMARY KEY id AUTOGENERATED;
>
> INSERT INTO root([
> {
> "A": "a1",
> "B": [{
> "B1": "b111",
> "B2": "b112"
> },
> {
> "B1": "b121",
> "B2": "b122"
> }]
> },
> {
> "A": "a2",
> "B": [{
> "B1": "b211",
> "B2": "b212"
> },
> {
> "B1": "b221",
> "B2": "b222"
> }]
> },
> {
> "A": "a3",
> "B": [{
> "B1": "b311",
> "B2": "b312"
> },
> {
> "B1": "b321",
> "B2": "b322"
>     }]
> }
> ]);
>
> FROM root, root.B as B
> SELECT root.A, (
> FROM B
> LET I = I + 1
> SELECT B.B1, B.B2, I
> ) AS B;
>
> 
>
> Basically I would like I to be an index of the occurrence of B being
> produced. Would be value 1 or 2.
>
> Output should look like this:
>
> [ { "A": "a1", "B": [ { "B1": "b111", "B2": "b112", "I": 1 }, { "B1":
> "b121", "B2": "b122", "I": 2 } ] }
> , { "A": "a2", "B": [ { "B1": "b211", "B2": "b212", "I": 1 }, { "B1":
> "b221", "B2": "b222", "I": 2 } ] }
> , { "A": "a3", "B": [ { "B1": "b311", "B2": "b312", "I": 1 }, { "B1":
> "b321", "B2": "b322", "I": 2 } ] }
>  ]
>
> Is there a way to do that?
>
> Thank you!
>
> Fady
>
>

-- 

*Regards,*
Wail Alkowaileet


LogRecord format change

2018-12-13 Thread Wail Alkowaileet
Dev,

As filed in ASTERIXDB-2491
<https://issues.apache.org/jira/browse/ASTERIXDB-2491>, changing the Log
Record format to have offsets of type Integer32 instead of Integer16 breaks
the back-compatibility for customers who wish to update their binaries.

There're two Log types that would be affected by this change: *UPDATE* and
*FILTER* log types. I found that the safest approach is to introduce two
new types of logs: *UPDATE_V2* and *FILTER_V2*. Both of which would have
field-ends' offsets of type Integer32 instead of Integer16 for their
values. To prevent such issue in the future, I was thinking of introducing
a version number for the value format. This would allow us to change the
value's format without introducing new Log Type.

Thoughts?

-- 

*Regards,*
Wail Alkowaileet


Re: LSM Filter thread safety

2018-12-09 Thread Wail Alkowaileet
Actually, it was just by looking at the code. After Investigating, I can
confirm it's not thread safe. I filed an issue at ASTERIXDB-2493
<https://issues.apache.org/jira/browse/ASTERIXDB-2493>.

On Sun, Dec 9, 2018 at 10:31 AM Ian Maxon  wrote:

> If my memory serves correctly the locking is done outside the filter
> modification, by virtue of the other locks around component modifications.
> So there shouldn’t be any within the filters themselves. Did you notice
> unusual behavior?
>
> > On Dec 8, 2018, at 21:21, Wail Alkowaileet  wrote:
> >
> > Dev,
> >
> > Is the in-memory LSMCompnentFilter (min and max) tuples thread safe? I
> > could not notice any locking mechanism that guarantees they are thread
> safe?
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
>


-- 

*Regards,*
Wail Alkowaileet


LSM Filter thread safety

2018-12-08 Thread Wail Alkowaileet
Dev,

Is the in-memory LSMCompnentFilter (min and max) tuples thread safe? I
could not notice any locking mechanism that guarantees they are thread safe?

-- 

*Regards,*
Wail Alkowaileet


Re: Question about the difference between array functions and collection function implementation

2018-02-20 Thread Wail Alkowaileet
Maybe something related to the function mapping:

org.apache.asterix.lang.sqlpp.util.FunctionMapUtil

On Tue, Feb 20, 2018 at 11:34 AM, James Fang <jfang...@ucr.edu> wrote:

> Hi,
>
> Recently, I have noticed that the aggregation functions I have been
> implementing only work as collection functions, and it will produce errors
> when ran as an array function. This only affects the new aggregate
> functions I have added and does not affect the existing aggregate
> functions(avg, max, min). Is there something specific I have to do to make
> these functions work as array functions.
>
> For example:
> *My Functions:*
> coll_stddev( [1.0, 2.0, 3.0] ) --> works
> array_stddev( [1.0, 2.0, 3.0] ) - --> undefined function.
>
> *Exisitng Functions:*
> coll_avg( [1.0, 2.0, 3.0] ) --> works
> array_avg( [1.0, 2.0, 3.0] ) - --> works
>
>
> I have only modified or added files in:
> 1) New FunctionDescriptors and AggregateFunctions
> 2) BuiltinFunctions.java
> 3) FunctionCollection.java
> 4) New Typecomputer.
>
> Stack trace of undefined function:
> 11:15:48.177 [HttpExecutor(port:19001)-13] ERROR org.apache.asterix -
> function Default.array_stddev@1 is not defined
> org.apache.asterix.common.exceptions.CompilationException: function
> Default.array_stddev@1 is not defined
> at
> org.apache.asterix.lang.common.util.FunctionUtil.
> retrieveUsedStoredFunctions(FunctionUtil.java:144)
> ~[classes/:?]
> at
> org.apache.asterix.lang.sqlpp.rewrites.SqlppQueryRewriter.
> inlineDeclaredUdfs(SqlppQueryRewriter.java:226)
> ~[classes/:?]
> at
> org.apache.asterix.lang.sqlpp.rewrites.SqlppQueryRewriter.
> rewrite(SqlppQueryRewriter.java:131)
> ~[classes/:?]
> at
> org.apache.asterix.api.common.APIFramework.reWriteQuery(
> APIFramework.java:199)
> ~[classes/:?]
> at
> org.apache.asterix.app.translator.QueryTranslator.rewriteCompileQuery(
> QueryTranslator.java:1894)
> ~[classes/:?]
> at
> org.apache.asterix.app.translator.QueryTranslator.lambda$handleQuery$2(
> QueryTranslator.java:2373)
> ~[classes/:?]
> at
> org.apache.asterix.app.translator.QueryTranslator.createAndRunJob(
> QueryTranslator.java:2496)
> ~[classes/:?]
> at
> org.apache.asterix.app.translator.QueryTranslator.
> deliverResult(QueryTranslator.java:2406)
> ~[classes/:?]
> at
> org.apache.asterix.app.translator.QueryTranslator.
> handleQuery(QueryTranslator.java:2385)
> ~[classes/:?]
> at
> org.apache.asterix.app.translator.QueryTranslator.compileAndExecute(
> QueryTranslator.java:381)
> ~[classes/:?]
> at org.apache.asterix.api.http.server.ApiServlet.post(ApiServlet.java:168)
> [classes/:?]
> at
> org.apache.hyracks.http.server.AbstractServlet.handle(
> AbstractServlet.java:92)
> [classes/:?]
> at
> org.apache.hyracks.http.server.HttpRequestHandler.
> handle(HttpRequestHandler.java:71)
> [classes/:?]
> at
> org.apache.hyracks.http.server.HttpRequestHandler.
> call(HttpRequestHandler.java:56)
> [classes/:?]
> at
> org.apache.hyracks.http.server.HttpRequestHandler.
> call(HttpRequestHandler.java:37)
> [classes/:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> [?:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> [?:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
>



-- 

*Regards,*
Wail Alkowaileet


Re: Comparison semantics for complex types

2017-12-29 Thread Wail Alkowaileet
What I meant is that the user should explicitly call deep_equal. So we
should not implicitly call it if we found out we're comparing two complex
types. Therefore, I think we should throw an exception when an expression x
= y found and x and y are complex types. (probably guide the user to use
deep_equal instead).
The reason is when the compiler generates the plan for join using deep
equal, it picks nested loop join instead of hash join.
But If the type is known at runtime (open type) and we can implicitly call
deep_equal then we're not sure what join algorithm we should pick as x and
y can be of any type.

We need to compute the hash for the hash join to resolve hash(x) and
hash(y) equally. And I was thinking that it's a bit complex to have a hash
function for complex types (what would be the hash code for a multiset of
objects?)

Sorry for my bad explanation :-)


On Fri, Dec 29, 2017 at 8:47 PM, Taewoo Kim <wangs...@gmail.com> wrote:

> I have two questions. How would you want to compare two complex objects?
> And why do we need to do a hash?
>
> On Fri, Dec 29, 2017 at 20:31 Wail Alkowaileet <wael@gmail.com> wrote:
>
> > I think we should not call deep_equal implicitly when comparing objects,
> > arrays or multisets.
> > One reason is that we don't want to do hash join where the key is a
> complex
> > type (i.e what would be the hash function?).
> >
> > On Fri, Dec 29, 2017 at 10:24 AM, Taewoo Kim <wangs...@gmail.com> wrote:
> >
> > > @Heri: I'm sorry for not mentioning your deep_equal function. Yeah,
> > indeed,
> > > we have your function. I checked BuiltinFunctions and found the
> function
> > > named "deep-equal". So, we need to explicitly use that function to
> > conduct
> > > such comparison? If so, could you revise Wail's query? And it would be
> > nice
> > > if AsterixDB can call that function when it tries to compare arrays.
> > >
> > > Best,
> > > Taewoo
> > >
> > > On Fri, Dec 29, 2017 at 8:59 AM, Heri Ramampiaro <heri...@gmail.com>
> > > wrote:
> > >
> > > > Is this similar to the “deep_equal” function I implemented a while
> ago?
> > > >
> > > > -heri
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 29, 2017, at 17:23, Mike Carey <dtab...@gmail.com> wrote:
> > > > >
> > > > > Indeed - we need it someday!  (Sooner rather than later would be
> > nice.)
> > > > It basically needs to work like it does in languages like Python, I
> > > think.
> > > > (Cardinality and element by element equality for arrays, cardinality
> > and
> > > > order-independent equality for bags, field by field equality for
> > records,
> > > > and recursively through all of them.)
> > > > >
> > > > >
> > > > >> On 12/28/17 11:14 PM, Taewoo Kim wrote:
> > > > >> If I remember correctly, we don't support deep equality comparison
> > in
> > > > >> AsterixDB yet.
> > > > >>
> > > > >> Best,
> > > > >> Taewoo
> > > > >>
> > > > >> On Thu, Dec 28, 2017 at 9:19 PM, Wail Alkowaileet <
> > wael@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Devs,
> > > > >>>
> > > > >>> Currently we have an inconsistent behavior regarding the
> > comparators:
> > > > >>>
> > > > >>> In join, we allow such operation
> > > > >>>
> > > > >>> SELECT *
> > > > >>> FROM [[1],[2],[3]] array1, [[1],[2],[3]] array2
> > > > >>> WHERE array1 = array2
> > > > >>>
> > > > >>> In select, an exception is thrown
> > > > >>> SELECT *
> > > > >>> FROM [[1],[2],[3]] array1
> > > > >>> WHERE array1 = [1]
> > > > >>>
> > > > >>> Error ASX0004: Unsupported type: comparison operations (>, >=, <,
> > and
> > > > <=)
> > > > >>> cannot process input type array
> > > > >>>
> > > > >>> What should be the semantics for such operations?
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> *Regards,*
> > > > >>> Wail Alkowaileet
> > > > >>>
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
> >
>



-- 

*Regards,*
Wail Alkowaileet


Re: Comparison semantics for complex types

2017-12-29 Thread Wail Alkowaileet
I think we should not call deep_equal implicitly when comparing objects,
arrays or multisets.
One reason is that we don't want to do hash join where the key is a complex
type (i.e what would be the hash function?).

On Fri, Dec 29, 2017 at 10:24 AM, Taewoo Kim <wangs...@gmail.com> wrote:

> @Heri: I'm sorry for not mentioning your deep_equal function. Yeah, indeed,
> we have your function. I checked BuiltinFunctions and found the function
> named "deep-equal". So, we need to explicitly use that function to conduct
> such comparison? If so, could you revise Wail's query? And it would be nice
> if AsterixDB can call that function when it tries to compare arrays.
>
> Best,
> Taewoo
>
> On Fri, Dec 29, 2017 at 8:59 AM, Heri Ramampiaro <heri...@gmail.com>
> wrote:
>
> > Is this similar to the “deep_equal” function I implemented a while ago?
> >
> > -heri
> >
> > Sent from my iPhone
> >
> > > On Dec 29, 2017, at 17:23, Mike Carey <dtab...@gmail.com> wrote:
> > >
> > > Indeed - we need it someday!  (Sooner rather than later would be nice.)
> > It basically needs to work like it does in languages like Python, I
> think.
> > (Cardinality and element by element equality for arrays, cardinality and
> > order-independent equality for bags, field by field equality for records,
> > and recursively through all of them.)
> > >
> > >
> > >> On 12/28/17 11:14 PM, Taewoo Kim wrote:
> > >> If I remember correctly, we don't support deep equality comparison in
> > >> AsterixDB yet.
> > >>
> > >> Best,
> > >> Taewoo
> > >>
> > >> On Thu, Dec 28, 2017 at 9:19 PM, Wail Alkowaileet <wael@gmail.com
> >
> > >> wrote:
> > >>
> > >>> Hi Devs,
> > >>>
> > >>> Currently we have an inconsistent behavior regarding the comparators:
> > >>>
> > >>> In join, we allow such operation
> > >>>
> > >>> SELECT *
> > >>> FROM [[1],[2],[3]] array1, [[1],[2],[3]] array2
> > >>> WHERE array1 = array2
> > >>>
> > >>> In select, an exception is thrown
> > >>> SELECT *
> > >>> FROM [[1],[2],[3]] array1
> > >>> WHERE array1 = [1]
> > >>>
> > >>> Error ASX0004: Unsupported type: comparison operations (>, >=, <, and
> > <=)
> > >>> cannot process input type array
> > >>>
> > >>> What should be the semantics for such operations?
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> *Regards,*
> > >>> Wail Alkowaileet
> > >>>
> > >
> >
>



-- 

*Regards,*
Wail Alkowaileet


Comparison semantics for complex types

2017-12-28 Thread Wail Alkowaileet
Hi Devs,

Currently we have an inconsistent behavior regarding the comparators:

In join, we allow such operation

SELECT *
FROM [[1],[2],[3]] array1, [[1],[2],[3]] array2
WHERE array1 = array2

In select, an exception is thrown
SELECT *
FROM [[1],[2],[3]] array1
WHERE array1 = [1]

Error ASX0004: Unsupported type: comparison operations (>, >=, <, and <=)
cannot process input type array

What should be the semantics for such operations?


-- 

*Regards,*
Wail Alkowaileet


Re: [VOTE] Release Apache AsterixDB 0.9.3 and Hyracks 0.3.3 (RC0)

2017-12-21 Thread Wail Alkowaileet
One thing I'm not sure about, there are pom.xml.versionsBackup in AsterixDB
modules.

On Thu, Dec 21, 2017 at 11:39 AM, Wail Alkowaileet <wael@gmail.com>
wrote:

> +1
> Downloaded
> Verified signatures and hashes
> Verified source build + ran unit tests and integration tests
>
> On Thu, Dec 21, 2017 at 11:11 AM, Taewoo Kim <wangs...@gmail.com> wrote:
>
>> +1
>>
>> Downloaded
>> Verified signatures and hashes
>> Verified the source build
>> Ran a local sample cluster and issued some queries
>>
>> PS:
>> https://cwiki.apache.org/confluence/display/ASTERIXDB/Releas
>> e+Verification
>> will be helpful to do these. :-)
>>
>>
>> Best,
>> Taewoo
>>
>> On Wed, Dec 20, 2017 at 10:14 PM, Mike Carey <dtab...@gmail.com> wrote:
>>
>> > +1
>> >
>> > Downloaded and ran the local version - walked through the tutorial docs
>> > (the SQL++ primer) - found and filed one issue in those that we should
>> go
>> > ahead and (finally :-)) fix.
>> >
>> >
>> >
>> > On 11/20/17 5:07 PM, Ian Maxon wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> Please verify and vote on the 4th release of Apache AsterixDB
>> >>
>> >> The change that produced this release and the change to advance the
>> >> version are
>> >> up for review here:
>> >>
>> >> https://asterix-gerrit.ics.uci.edu/#/c/2170/
>> >> https://asterix-gerrit.ics.uci.edu/#/c/2171
>> >>
>> >> To check out the release, simply fetch the review and check out the
>> >> fetch head like so:
>> >>
>> >> git fetch https://asterix-gerrit.ics.uci.edu:29418/asterixdb
>> >> refs/changes/70/2070/1 && git checkout FETCH_HEAD
>> >>
>> >>
>> >> AsterixDB Source
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/apache-aste
>> >> rixdb-0.9.3-source-release.zip
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/apache-aste
>> >> rixdb-0.9.3-source-release.zip.asc
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/apache-aste
>> >> rixdb-0.9.3-source-release.zip.sha1
>> >>
>> >> SHA1:52ecae081b5d4ef8e7cabcd6531471c408a0a7ac
>> >>
>> >> Hyracks Source
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyra
>> >> cks-0.3.3-source-release.zip
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyra
>> >> cks-0.3.3-source-release.zip.asc
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/apache-hyra
>> >> cks-0.3.3-source-release.zip.sha1
>> >>
>> >> SHA1:1457b140e61a11da8caa6da75cbeba7c553371de
>> >>
>> >> AsterixDB NCService Installer:
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-ser
>> >> ver-0.9.3-binary-assembly.zip
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-ser
>> >> ver-0.9.3-binary-assembly.zip.asc
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-ser
>> >> ver-0.9.3-binary-assembly.zip.sha1
>> >>
>> >> SHA1:f05574389ac10a7da9696b4435ad53a5f6c0053a
>> >>
>> >> AsterixDB Managix Installer
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-ins
>> >> taller-0.9.3-binary-assembly.zip
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-ins
>> >> taller-0.9.3-binary-assembly.zip.asc
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-ins
>> >> taller-0.9.3-binary-assembly.zip.sha1
>> >>
>> >> SHA1:ef308b80441ac2c9437f9465dab3decd35b30189
>> >>
>> >> AsterixDB YARN Installer
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-yar
>> >> n-0.9.3-binary-assembly.zip
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-yar
>> >> n-0.9.3-binary-assembly.zip.asc
>> >> https://dist.apache.org/repos/dist/dev/asterixdb/asterix-yar
>> >> n-0.9.3-binary-assembly.zip.sha1
>> >>
>> >> SHA1:e85c09dd8ff18902503868626bdee301184e4310
>> >>
>> >> Additionally, a staged maven repository is available at:
>> >>
>> >> https://repository.apache.org/content/repositories/orgapache
>> asterix-1036/
>> >>
>> >> The KEYS file containing the PGP keys used to sign the release can be
>> >> found at
>> >>
>> >> https://dist.apache.org/repos/dist/release/asterixdb/KEYS
>> >>
>> >> RAT was executed as part of Maven via the RAT maven plugin, but
>> >> excludes files that are:
>> >>
>> >> - data for tests
>> >> - procedurally generated,
>> >> - or source files which come without a header mentioning their license,
>> >>but have an explicit reference in the LICENSE file.
>> >>
>> >>
>> >> The vote is open for 72 hours, or until the necessary number of votes
>> >> (3 +1) has been reached.
>> >>
>> >> Please vote
>> >> [ ] +1 release these packages as Apache AsterixDB 0.9.3 and
>> >> Apache Hyracks 0.3.3
>> >> [ ] 0 No strong feeling either way
>> >> [ ] -1 do not release one or both packages because ...
>> >>
>> >> Thanks!
>> >>
>> >
>> >
>>
>
>
>
> --
>
> *Regards,*
> Wail Alkowaileet
>



-- 

*Regards,*
Wail Alkowaileet


[Discuss] Inlining assign operator

2017-12-07 Thread Wail Alkowaileet
Hi Devs,

I've been in the Algebricks vicinity lately and I think there are few
things we can do to reduce the plan size and probably the execution time. I
will file a JIRA issue for other things I noticed.

First I want to discuss the current use of the Assign operator as I need it
for my current work.

Let's see an example:
*-- Query:*

SELECT t.text as text, t.place.full_name as city
FROM Tweets as t
WHERE t.retweet_count > 10
AND spatial_intersect (t.geo.coordinates.coordinates,
create_rectangle(create_point(-107.27, 33.06), create_point(-89.1,
38.9)));

*-- Plan:*

distribute result [$$19]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$19])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$19] <- [{"text": $$t.getField("text"), "city":
$$25.getField("full_name")}]
  -- ASSIGN  |PARTITIONED|
project ([$$t, $$25])
-- STREAM_PROJECT  |PARTITIONED|
  select (and(gt($$t.getField("retweet_count"), 10),
spatial-intersect($$27.getField("coordinates"), rectangle: { p1: point: {
x: -107.27, y: 33.06 }, p2: point: { x: -89.1, y: 38.9 }})))
  -- STREAM_SELECT  |PARTITIONED|
assign [$$27, $$25] <-
[$$t.getField("geo").getField("coordinates"), $$t.getField("place")]
-- ASSIGN  |PARTITIONED|
  project ([$$t])
  -- STREAM_PROJECT  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  data-scan []<-[$$20, $$t] <- TwitterDataverse.Tweets
  -- DATASOURCE_SCAN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  empty-tuple-source
  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

*-- Observation:*

- In this example, *assign [$$27, $$25]* evaluates*
$$t.getField("geo").getField("coordinates")* ($$27) even though it might
not to be used (short-circuited in the AND).
- Similarly, because *assign [$$27, $$25] *evaluates *$t.getField("place")*
($$25) much earlier, the size of project ([$$t, $$25]) is greater than
project ([$$t]). Given that $$25 can be evaluated from $$t.
- We can see that Assign does not do anything good in this case and
probably should be removed.

There are two policies but not sure which one is better:
1- Aggressively push down field access to fit more tuples/frame, but might
do unnecessary evaluation as in the example above.
2- Push down SELECT and only evaluate common expression with the SELECT and
then do field access afterwords. But might have less tuples/frame.

Also:
1- Assign that only been used once should be inlined (inline if the upper
operator can do scalar evaluation such as select/assign). **Some plans have
two consecutives assigns.

I'm leaning toward (2) for the reason that IScalarEvaluators are chained
and works per tuple basis (almost an iterator-model in a frame) and can be
more expensive in terms of function calls.

Any suggestions?
-- 

*Regards,*
Wail Alkowaileet


Re: Primary key lookup plan

2017-12-03 Thread Wail Alkowaileet
Got the issue...
if the primary key type is not compatible with the predicate type ... it
turns into a scan.

Thanks Taewoo!

On Sun, Dec 3, 2017 at 4:08 PM, Taewoo Kim <wangs...@gmail.com> wrote:

> From Line 531
> https://github.com/apache/asterixdb/blob/master/
> asterixdb/asterix-algebra/src/main/java/org/apache/asterix/
> optimizer/rules/am/BTreeAccessMethod.java
>
>
> Best,
> Taewoo
>
> On Sun, Dec 3, 2017 at 4:05 PM, Taewoo Kim <wangs...@gmail.com> wrote:
>
> > My understanding is that if a select condition can be covered by the
> > primary key (i.e., only contains the primary key condition and B+Tree can
> > be utilized), then only unnest-map should survive.
> >
> >
> > Best,
> > Taewoo
> >
> > On Sun, Dec 3, 2017 at 4:03 PM, Chen Luo <cl...@uci.edu> wrote:
> >
> >> I don't think it's the case...I tried on my local env, and it's using a
> >> primary index lookup instead of scan. Can you make sure the spelling of
> >> the
> >> primary key is correct?
> >>
> >> On Sun, Dec 3, 2017 at 3:49 PM, Wail Alkowaileet <wael@gmail.com>
> >> wrote:
> >>
> >> > Hi Devs,
> >> >
> >> > *For the given query:*
> >> >
> >> > SELECT VALUE t.text
> >> > FROM ITweets as t
> >> > WHERE t.tid = 100
> >> >
> >> > *The optimized plan:*
> >> >
> >> > distribute result [$$6]
> >> > -- DISTRIBUTE_RESULT  |PARTITIONED|
> >> >   exchange
> >> >   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >> > project ([$$6])
> >> > -- STREAM_PROJECT  |PARTITIONED|
> >> >   assign [$$6] <- [$$t.getField("text")]
> >> >   -- ASSIGN  |PARTITIONED|
> >> > project ([$$t])
> >> > -- STREAM_PROJECT  |PARTITIONED|
> >> >   select (eq($$7, 100))
> >> >   -- STREAM_SELECT  |PARTITIONED|
> >> > exchange
> >> > -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >> >   data-scan []<-[$$7, $$t] <- FlatDataverse.ITweets
> >> >   -- DATASOURCE_SCAN  |PARTITIONED|
> >> > exchange
> >> > -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >> >   empty-tuple-source
> >> >   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
> >> >
> >> > Do we always do a scan and then filter the result, even though the
> query
> >> > predicate is on the primary key?
> >> > --
> >> >
> >> > *Regards,*
> >> > Wail Alkowaileet
> >> >
> >>
> >
> >
>



-- 

*Regards,*
Wail Alkowaileet


Primary key lookup plan

2017-12-03 Thread Wail Alkowaileet
Hi Devs,

*For the given query:*

SELECT VALUE t.text
FROM ITweets as t
WHERE t.tid = 100

*The optimized plan:*

distribute result [$$6]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$6])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$6] <- [$$t.getField("text")]
  -- ASSIGN  |PARTITIONED|
project ([$$t])
-- STREAM_PROJECT  |PARTITIONED|
  select (eq($$7, 100))
  -- STREAM_SELECT  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  data-scan []<-[$$7, $$t] <- FlatDataverse.ITweets
  -- DATASOURCE_SCAN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  empty-tuple-source
  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Do we always do a scan and then filter the result, even though the query
predicate is on the primary key?
-- 

*Regards,*
Wail Alkowaileet


Re: Polygon validation

2017-11-30 Thread Wail Alkowaileet
For the Jackson parser implementation for GeoJSON [1]...

Polygon should start and end at the same point (as in [2]) and does not
allow polygon with holes.

[1] https://asterix-gerrit.ics.uci.edu/#/c/2076/
[2] https://tools.ietf.org/html/rfc7946#page-9

On Thu, Nov 30, 2017 at 11:07 AM, Wail Alkowaileet <wael@gmail.com>
wrote:

> Thanks Sattam... :-)
>
> @Ahmed, I would be grateful for your input
>
> On Thu, Nov 30, 2017 at 7:43 AM, Mike Carey <dtab...@gmail.com> wrote:
>
>> @Ahmed?  Since you're the new defacto CGO (Chief Geo Officer) in
>> AsterixDB - do you happen to know?  (I know you are on the GeoJSON side -
>> but - not sure if you looked at the old polygon support/code at first?)
>>
>>
>> On 11/30/17 12:57 AM, Sattam Alsubaiee wrote:
>>
>>> Both convex and concave polygons are supported. As far as I remember,
>>> complex polygons such as self interesting polygons and polygons with
>>> holes
>>> are not supported. Not sure if they are supported now.
>>>
>>> Sattam
>>>
>>> On Nov 30, 2017 3:50 AM, "Wail Alkowaileet" <wael@gmail.com> wrote:
>>>
>>> Hi Dev,
>>>>
>>>> What's the polygon definition we have in AsterixDB?
>>>> I'm a little bit confused. The only constraint we have is that the
>>>> polygon
>>>> must have at least 3 points. I remember Sattam was mentioning something
>>>> like only "convex" (not sure) polygons are allowed. But it seems there
>>>> are
>>>> no enforcement to any constraint.
>>>>
>>>> --
>>>>
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>>
>>>>
>>
>
>
> --
>
> *Regards,*
> Wail Alkowaileet
>



-- 

*Regards,*
Wail Alkowaileet


Polygon validation

2017-11-29 Thread Wail Alkowaileet
Hi Dev,

What's the polygon definition we have in AsterixDB?
I'm a little bit confused. The only constraint we have is that the polygon
must have at least 3 points. I remember Sattam was mentioning something
like only "convex" (not sure) polygons are allowed. But it seems there are
no enforcement to any constraint.

-- 

*Regards,*
Wail Alkowaileet


Re: SPARSOBJECT type tag

2017-11-28 Thread Wail Alkowaileet
I had the same question. As Mike mentioned "They were future thoughts"

https://mail-archives.apache.org/mod_mbox/asterixdb-dev/201711.mbox/browser

On Tue, Nov 28, 2017 at 4:14 PM, abdullah alamoudi <bamou...@gmail.com>
wrote:

> Devs,
> I found a type tag that is SPARSOBJECT.
> Anybody knows what that is?
>
> Thanks,
> Abdullah.
>



-- 

*Regards,*
Wail Alkowaileet


Re: Change in merge policy syntax

2017-11-27 Thread Wail Alkowaileet
+1

On Nov 27, 2017 19:24, "abdullah alamoudi"  wrote:

> Dear devs,
> We would like to get your input on changing the syntax of the merge policy
> definition.
>
> Current syntax:
>prefix_merge (("number"="123"),("size"="
> 456"));
>
> Proposed syntax
>  {"compaction policy": {"name": "prefix_merge", "parameters":
> {"number": 123,"size": 456}}};
>
> Advantages:
> 1. Compaction and policy are not key words anymore.
> 2. Use a JSON for key value pairs instead of a strange syntax.
>
> Thoughts?


Re: Unused types in ATypeTag

2017-11-18 Thread Wail Alkowaileet
I see :)
Thanks!

On Sat, Nov 18, 2017 at 7:23 AM, Mike Carey <dtab...@gmail.com> wrote:

> They were future thoughts.  
>
> On Nov 17, 2017 5:15 PM, "Wail Alkowaileet" <wael@gmail.com> wrote:
>
> > Hi all,
> >
> > There are few types that never been used (e.g ENUM, TYPE, unsigned
> > integers). Are they still there for legacy reason/back compatibility ?
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
> >
>



-- 

*Regards,*
Wail Alkowaileet


Unused types in ATypeTag

2017-11-17 Thread Wail Alkowaileet
Hi all,

There are few types that never been used (e.g ENUM, TYPE, unsigned
integers). Are they still there for legacy reason/back compatibility ?

-- 

*Regards,*
Wail Alkowaileet


Re: Re: Adapting TimSort into AsterixDB/Hyracks

2017-10-28 Thread Wail Alkowaileet
P.S Spark implementation is under Apache.

On Sat, Oct 28, 2017 at 9:41 AM, Wail Alkowaileet <wael@gmail.com>
wrote:

> Android has an implementation:
> https://github.com/retrostreams/android-retrostreams/blob/master/src/
> main/java/java9/util/TimSort.java
>
> Spark has ported it
> https://github.com/apache/spark/blob/master/core/src/
> main/java/org/apache/spark/util/collection/TimSort.java
>
> We can customize it for AsterixDB comparators.
>
> On Sat, Oct 28, 2017 at 9:28 AM, Chen Luo <cl...@uci.edu> wrote:
>
>> I don't know whether there is an easy way for us to directly reuse TimSort
>> in JDK since it's design for sorting objects in an Array, while in Hyracks
>> we don't have explicit object creation and sort everything in main memory.
>> So what I did is that I copied it's source code, and replaced all object
>> assignments/swaps/comparisons using Hyracks in-memory operations.
>>
>> Best regards,
>> Chen Luo
>>
>> On Sat, Oct 28, 2017 at 12:07 AM, 李文海 <8...@whu.edu.cn> wrote:
>>
>> > I believe reusing jdk afap could be better. btw, timsort is better than
>> > others by 1x when records are locally ordered .
>> > best
>> >
>> > 在 2017-10-28 14:38:21,"abdullah alamoudi" <bamou...@gmail.com> 写道:
>> >
>> > >While I have no answer to the question of legality, this sounds great.
>> > >
>> > >~Abdullah.
>> > >
>> > >> On Oct 27, 2017, at 9:20 PM, Chen Luo <cl...@uci.edu> wrote:
>> > >>
>> > >> Hi devs,
>> > >>
>> > >> I have adapted the TimSort algorithm used in JDK (java.util.TimSort)
>> > into
>> > >> Hyracks, which gives 10-20% performance improvements on random data.
>> It
>> > >> will be more useful if the input data is partially sorted, e.g.,
>> primary
>> > >> keys fetched from secondary index scan, which I haven't got time to
>> > >> experiment with.
>> > >>
>> > >> *Before going any further, is it legal to adapt some algorithm
>> > >> implementation from JDK into our codebase? *I saw the JDK
>> implementation
>> > >> itself is adopted from
>> > >> http://svn.python.org/projects/python/trunk/Objects/listsort.txt as
>> > well.
>> > >>
>> > >> Best regards,
>> > >> Chen Luo
>> > >
>> >
>> >
>>
>
>
>
> --
>
> *Regards,*
> Wail Alkowaileet
>



-- 

*Regards,*
Wail Alkowaileet


Re: PrimaryIndexOperationTracker

2017-10-23 Thread Wail Alkowaileet
Thanks Abdullah!

On Mon, Oct 23, 2017 at 7:15 PM, abdullah alamoudi <bamou...@gmail.com>
wrote:

> Hi Wail,
> There is no fundamental reason why it is one. In fact, it has been on our
> todo for a long time to make it one per partition.
>
> Cheers,
> Abdullah.
>
> > On Oct 23, 2017, at 7:14 PM, Wail Alkowaileet <wael@gmail.com>
> wrote:
> >
> > Dear devs,
> >
> > I have a question regarding the opTracker. Currently, we initialize one
> > opTracker per dataset in every NC.
> >
> > My question is why it's per dataset not per partition ? Is there a
> > transactional constraints for that ?
> >
> > From what I can see that the opTracker can create a lot of contention in
> > case there're many IO devices. For instance, each insert will call
> > *LSMHarness.getAndEnterComponents()* [1], which
> *synchronize(opTracker). *That
> > means (correct me if I'm wrong), insert is going to serialize the
> > *enterComponent()* part among partitions.
> >
> > [1]
> > https://github.com/apache/asterixdb/blob/master/hyracks-
> fullstack/hyracks/hyracks-storage-am-lsm-common/src/
> main/java/org/apache/hyracks/storage/am/lsm/common/impls/
> LSMHarness.java#L86
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
>
>


-- 

*Regards,*
Wail Alkowaileet


Re: Strange error trying to run Asterix master

2017-10-04 Thread Wail Alkowaileet
If someone got into this again, the solution is:

1- Go to asterix-runtime
2- mvn clean <-- must clean first.
3- mvn -DskipTests install

On Fri, Sep 29, 2017 at 3:26 PM, Chen Luo <cl...@uci.edu> wrote:

> Hi Steven,
>
> I was using Eclipse (on mac) to debug AsterixDB's code, and things work
> well for me. After switch to another branch, I think we need to run "mvn
> clean install" to rebuild class files, and in the meanwhile need to refresh
> the workspace in Eclipse to rebuild things in Eclipse.
>
> Best regards,
> Chen Luo
>
> On Fri, Sep 29, 2017 at 11:56 AM, Steven Jacobs <sjaco...@ucr.edu> wrote:
>
> > I'm on build build 1.8.0_65-b17. I've switched to Intellij and the
> problem
> > doesn't occur there so it seems to be related to Eclipse specifically.
> > Steven
> >
> > On Thu, Sep 28, 2017 at 10:22 PM, Michael Blow <mblow.apa...@gmail.com>
> > wrote:
> >
> > > What JVM is this?  Try Oracle latest Java 8 if not already using.
> > >
> > > -MDB
> > >
> > > On Fri, Sep 29, 2017 at 12:37 AM Steven Jacobs <sjaco...@ucr.edu>
> wrote:
> > >
> > > > If only that worked for me :( I have even tried deleting the m2
> > > repository
> > > > cache completely.
> > > > Steven
> > > >
> > > > On Thu, Sep 28, 2017 at 8:19 PM Wail Alkowaileet <wael@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I got the same issue before. I did "clean project" and the issue
> > seems
> > > to
> > > > > be resolved.
> > > > >
> > > > > On Thu, Sep 28, 2017 at 2:26 PM, Steven Jacobs <sjaco...@ucr.edu>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > > I've been stuck for some time now trying to get master Asterix to
> > > debug
> > > > > > correctly for me in Eclipse on my machine. It seems to deal with
> > the
> > > > > class
> > > > > > generation being done by maven, but that's as far as I can see so
> > > far.
> > > > No
> > > > > > one I've talked to has a similar issue, so I was wondering if
> > anyone
> > > > from
> > > > > > the community at large has had such an issue. It manifests itself
> > > when
> > > > > > tying to create the evaluator for a function. The stack trace is
> > > > attached
> > > > > > below. If anyone has seen such an issue, I would love to get any
> > > advice
> > > > > you
> > > > > > may have.
> > > > > >
> > > > > >
> > > > > > java.lang.VerifyError: Bad return type
> > > > > >
> > > > > > Exception Details:
> > > > > >
> > > > > >   Location:
> > > > > >
> > > > > >
> > > > > > org/apache/asterix/runtime/evaluators/functions/records/
> > > > > > RecordMergeDescriptor$_EvaluatorFactoryGen.access$0(
> > > > > > Lorg/apache/asterix/runtime/evaluators/functions/records/
> > > > > >
> > > > RecordMergeDescriptor$_EvaluatorFactoryGen;)Lorg/
> > apache/asterix/runtime/
> > > > > > evaluators/functions/records/RecordMergeDescriptor;
> > > > > > @4: areturn
> > > > > >
> > > > > >   Reason:
> > > > > >
> > > > > > Type
> > > > > > 'org/apache/asterix/runtime/evaluators/functions/records/
> > > > > > RecordMergeDescriptor$_Gen'
> > > > > > (current frame, stack[0]) is not assignable to
> > > > > > 'org/apache/asterix/runtime/evaluators/functions/records/
> > > > > > RecordMergeDescriptor'
> > > > > > (from method signature)
> > > > > >
> > > > > >   Current Frame:
> > > > > >
> > > > > > bci: @4
> > > > > >
> > > > > > flags: { }
> > > > > >
> > > > > > locals: {
> > > > > > 'org/apache/asterix/runtime/evaluators/functions/records/
> > > > > > RecordMergeDescriptor$_EvaluatorFactoryGen'
> > > > > > }
> > > > > >
> > > > > > stack: {
> > > > > > 'org/apache/asterix/runtime/evaluators/

Re: Strange error trying to run Asterix master

2017-09-28 Thread Wail Alkowaileet
ator.QueryTranslator.
> rewriteCompileInsertUpsert(
> QueryTranslator.java:1864)
>
> at org.apache.asterix.app.translator.QueryTranslator.lambda$0(
> QueryTranslator.java:1752)
>
> at
> org.apache.asterix.app.translator.QueryTranslator.
> handleInsertUpsertStatement(
> QueryTranslator.java:1778)
>
> at org.apache.asterix.app.translator.QueryTranslator.compileAndExecute(
> QueryTranslator.java:336)
>
> at org.apache.asterix.api.http.server.ApiServlet.post(ApiServlet.java:162)
>
> at org.apache.hyracks.http.server.AbstractServlet.handle(
> AbstractServlet.java:78)
>
> at org.apache.hyracks.http.server.HttpRequestHandler.handle(
> HttpRequestHandler.java:70)
>
> at org.apache.hyracks.http.server.HttpRequestHandler.call(
> HttpRequestHandler.java:55)
>
> at org.apache.hyracks.http.server.HttpRequestHandler.call(
> HttpRequestHandler.java:36)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>



-- 

*Regards,*
Wail Alkowaileet


Re: Error when building on a new machine

2017-08-25 Thread Wail Alkowaileet
gt;> ConstantFoldingVisitor.changeRec(ConstantFoldingRule.java:259)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:185)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:153)
> >>> at org.apache.hyracks.algebricks.core.algebra.expressions.
> >>> ScalarFunctionCallExpression.accept(ScalarFunctionCallExpression.
> java:55)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.changeRec(ConstantFoldingRule.java:259)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:185)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:153)
> >>> at org.apache.hyracks.algebricks.core.algebra.expressions.
> >>> ScalarFunctionCallExpression.accept(ScalarFunctionCallExpression.
> java:55)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.changeRec(ConstantFoldingRule.java:259)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:185)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:153)
> >>> at org.apache.hyracks.algebricks.core.algebra.expressions.
> >>> ScalarFunctionCallExpression.accept(ScalarFunctionCallExpression.
> java:55)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.changeRec(ConstantFoldingRule.java:259)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:185)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.visitScalarFunctionCallExpress
> >>> ion(ConstantFoldingRule.java:153)
> >>> at org.apache.hyracks.algebricks.core.algebra.expressions.
> >>> ScalarFunctionCallExpression.accept(ScalarFunctionCallExpression.
> java:55)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule$
> >>> ConstantFoldingVisitor.transform(ConstantFoldingRule.java:163)
> >>> at org.apache.hyracks.algebricks.core.algebra.operators.logical.
> >>> AbstractAssignOperator.acceptExpressionTransform(
> >>> AbstractAssignOperator.java:67)
> >>> at org.apache.asterix.optimizer.rules.ConstantFoldingRule.rewritePost(
> >>> ConstantFoldingRule.java:150)
> >>> at org.apache.hyracks.algebricks.core.rewriter.base.AbstractRul
> >>> eController.
> >>> rewriteOperatorRef(AbstractRuleController.java:126)
> >>> at org.apache.hyracks.algebricks.core.rewriter.base.AbstractRul
> >>> eController.
> >>> rewriteOperatorRef(AbstractRuleController.java:100)
> >>> at org.apache.hyracks.algebricks.core.rewriter.base.AbstractRul
> >>> eController.
> >>> rewriteOperatorRef(AbstractRuleController.java:100)
> >>> at org.apache.hyracks.algebricks.compiler.rewriter.rulecontrollers.
> >>> SequentialFixpointRuleController.rewriteWithRuleCollection(
> >>> SequentialFixpointRuleController.java:53)
> >>> at org.apache.hyracks.algebricks.core.rewriter.base.
> HeuristicOptimizer.
> >>> runOptimizationSets(HeuristicOptimizer.java:102)
> >>> at org.apache.hyracks.algebricks.core.rewriter.base.
> >>> HeuristicOptimizer.optimize(HeuristicOptimizer.java:82)
> >>> at org.apache.hyracks.algebricks.compiler.api.HeuristicCompiler
> >>> FactoryBuilde
> >>> r$1$1.optimize(HeuristicCompilerFactoryBuilder.java:90)
> >>> at org.apache.asterix.api.common.APIFramework.compileQuery(
> >>> APIFramework.java:267)
> >>> at org.apache.asterix.app.translator.QueryTranslator.
> rewriteCompileQuery(
> >>> QueryTranslator.java:1833)
> >>> at org.apache.asterix.app.translator.QueryTranslator.lambda$han
> >>> dleQuery$1(
> >>> QueryTranslator.java:2306)
> >>> at org.apache.asterix.app.translator.QueryTranslator.createAndRunJob(
> >>> QueryTranslator.java:2406)
> >>> at org.apache.asterix.app.translator.QueryTranslator.
> >>> deliverResult(QueryTranslator.java:2339)
> >>> at org.apache.asterix.app.translator.QueryTranslator.
> >>> handleQuery(QueryTranslator.java:2318)
> >>> at org.apache.asterix.app.translator.QueryTranslator.
> compileAndExecute(
> >>> QueryTranslator.java:370)
> >>> at org.apache.asterix.app.translator.QueryTranslator.
> compileAndExecute(
> >>> QueryTranslator.java:253)
> >>> at org.apache.asterix.api.http.server.ApiServlet.post(ApiServle
> >>> t.java:153)
> >>> at org.apache.hyracks.http.server.AbstractServlet.handle(
> >>> AbstractServlet.java:78)
> >>> at org.apache.hyracks.http.server.HttpRequestHandler.
> >>> handle(HttpRequestHandler.java:70)
> >>> at org.apache.hyracks.http.server.HttpRequestHandler.
> >>> call(HttpRequestHandler.java:55)
> >>> at org.apache.hyracks.http.server.HttpRequestHandler.
> >>> call(HttpRequestHandler.java:36)
> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >>> ThreadPoolExecutor.java:1142)
> >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >>> ThreadPoolExecutor.java:617)
> >>> at java.lang.Thread.run(Thread.java:745)
> >>> Caused by: java.lang.IllegalStateException:
> >>> java.lang.ClassNotFoundException:
> >>> org.apache.asterix.runtime.evaluators.functions.records.
> >>> FieldAccessByIndexDescriptor$_Gen
> >>> at org.apache.asterix.runtime.functions.FunctionCollection.
> >>> getGeneratedFunctionDescriptorFactory(FunctionCollection.java:656)
> >>> at org.apache.asterix.runtime.functions.FunctionCollection.<
> >>> clinit>(FunctionCollection.java:631)
> >>> ... 52 more
> >>> Caused by: java.lang.ClassNotFoundException:
> org.apache.asterix.runtime.
> >>> evaluators.functions.records.FieldAccessByIndexDescriptor$_Gen
> >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>> at org.apache.asterix.runtime.functions.FunctionCollection.
> >>> getGeneratedFunctionDescriptorFactory(FunctionCollection.java:652)
> >>> ... 53 more
> >>>
> >>> In my machine the code works fine. In fresh machine it doesn't. When I
> >>> built the master first and the given branch next it works fine. The
> code
> >>> runs all the integration tests in gerrit also successfully. The error
> is
> >>> occuring at "getGeneratedFunctionDescriptorFactory" function at the
> line
> >>> "Class generatedCl = cl.getClassLoader().loadClass(className);"
> where
> >>> it
> >>> calls for loadclass.
> >>>
> >>> I am completely puzzled by this behaviour in a fresh clone of the
> branch.
> >>> Any insite into this if any would be highly helpful. I am unable to
> find
> >>> the root cause becaue it occurs only in a fresh clone and when master
> is
> >>> not built before my branch. Kindly help me figure out the issue. Have I
> >>> changed the structure so badly that I am breaking everything?
> >>> Kindly help.
> >>>
> >>> Thank you.
> >>> Sincerely,
> >>> Riyafa
> >>>
> >>> [1] https://github.com/riyafa/asterixdb
> >>> [2] http://localhost:19001/
> >>> [3] https://asterix-gerrit.ics.uci.edu/#/c/1838/
> >>>
> >>
> >>
>
>


-- 

*Regards,*
Wail Alkowaileet


Re: License issue when using esri geometry api

2017-06-25 Thread Wail Alkowaileet
I can see from the code that there is a serder steps as such: Asterix
Object (binary) --> JSON (String) --> Esri geometry (Java object) --> Esri
geometry (binary).
I think it would be nice if there is a binary-to-binary conversion without
any deserialization (4th option).

On Sun, Jun 25, 2017 at 6:36 PM, Mike Carey <dtab...@gmail.com> wrote:

> Agreed that 3 or 4 are what we'll need to do!  (Sigh.)
>
>
>
> On 6/25/17 5:57 AM, Till Westmann wrote:
>
>> Hi Riyafa,
>>
>> I think that the problem is bigger than the failing test. The JSON
>> license itself is not acceptable for inclusion in an Apache artifact
>> [4]. So we cannot use the ESRI API as-is, if we want the GeoJSON
>> functionality be a non-optional part of AsterixDB.
>>
>> Here are a few options I see:
>> 1) Make GeoJSON an optional part of AsterixDB (separate download from a
>>non-Apache location).
>> 2) Make JSON.org a dependency that is not shipped (i.e. each user would
>>have to download and install those jars separately - and get error
>>messages if the jars are not available).
>> 3) Create a clone/copy of the ESRI API that uses another JSON library.
>> 4) Do all of the parsing independently from the ESRI API.
>>
>> I’m not sure if 1) is a good option as the extensibility in this part
>> of the code might not be sufficient to support this option easily.
>> 2) is technically easier, but it involves an unpleasant user
>> experience.
>> Also, I think that both 1) and 2) are not desirable, as GeoJSON should
>> be supported by vanilla AsterixDB.
>> For 3) and 4) we would need to look into the details to see how much
>> work is required for each of those options and if there are other legal
>> hurdles.
>>
>> Are there other options?
>> Other thoughts/concerns?
>>
>> Cheers,
>> Till
>>
>> [4] https://www.apache.org/legal/resolved.html#category-x
>>
>> On 25 Jun 2017, at 13:57, Riyafa Abdul Hameed wrote:
>>
>> Dear all,
>>>
>>> I implemented parse_geojon() function[1] using esri-geometry api[2] which
>>> is apache-2.0 licensed. But this api uses org.json as a dependency.
>>> org.json is licensed under JSON which causes a license issue in the code
>>> I
>>> have written[3]. What can I do about this issue?
>>>
>>> [1] https://asterix-gerrit.ics.uci.edu/1838
>>> [2]https://github.com/Esri/geometry-api-java
>>> [3]
>>> https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integ
>>> ration-tests/3290/
>>>
>>> Thank you.
>>> Yours sincerely,
>>> Riyafa
>>>
>>
>


-- 

*Regards,*
Wail Alkowaileet


[COMP] Few questions about Query Optimizer

2017-06-24 Thread Wail Alkowaileet
Hi Devs,

I have few questions about the query optimizer.

*- Given the query:*
use dataverse TwitterDataverse

for $x in dataset Tweets
where $x.name = "trump"
let $geo := $x.geo
group by $name:=$x.name with $geo
return {"name": $name, "geo":$geo[0].coordinates.coordinates}

*- Logical Plan:*
distribute result [$$10] -- |UNPARTITIONED|
  project ([$$10]) -- |UNPARTITIONED|
assign [$$10] <- [{"name": $$name, "geo": get-item($$9,
0).getField("coordinates").getField("coordinates")}] -- |UNPARTITIONED|
  group by ([$$name := $$x.getField("name")]) decor ([]) {
aggregate [$$9] <- [listify($$geo)] -- |UNPARTITIONED|
  nested tuple source -- |UNPARTITIONED|
 } -- |UNPARTITIONED|
assign [$$geo] <- [$$x.getField("geo")] -- |UNPARTITIONED|
  select (eq($$x.getField("name"), "Alice")) -- |UNPARTITIONED|
unnest $$x <- dataset("Tweets") -- |UNPARTITIONED|
  empty-tuple-source -- |UNPARTITIONED|

*- Optimized Logical Plan:*
distribute result [$$10]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$10])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$10] <- [{"name": $$name, "geo": $$19.getField("coordinates")
}]
  -- ASSIGN  |PARTITIONED|
project ([$$name, $$19])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$19, $$22] <- [get-item($$9,
0).getField("coordinates"), get-item($$9,
0)]
  -- ASSIGN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  group by ([$$name := $$15]) decor ([]) {
aggregate [$$9] <- [listify($$geo)]
-- AGGREGATE  |LOCAL|
  nested tuple source
  -- NESTED_TUPLE_SOURCE  |LOCAL|
 }
  -- PRE_CLUSTERED_GROUP_BY[$$15]  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  order (ASC, $$15)
  -- STABLE_SORT [$$15(ASC)]  |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$15]  |PARTITIONED|
  select (eq($$15, "Alice"))
  -- STREAM_SELECT  |PARTITIONED|
project ([$$geo, $$15])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$geo, $$15] <- [$$x.getField("geo"),
$$x.getField("name")]
  -- ASSIGN  |PARTITIONED|
project ([$$x])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
data-scan []<-[$$16, $$x] <-
TwitterDataverse.Tweets
-- DATASOURCE_SCAN  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE  |PARTITIONED|

*- Questions:*
$$22:

   - Why the variable $22 is produced ? Although there is no use for it. Is
   it just a harmless bug or there's some intuition I might be missing?

$$19:

   - It seems (sometimes) getField function calls are splitted. Is there a
   reason why is that the case? (There's another example that reproduces the
   same behavior)
   - That leads to my next question, I see no rule for "FieldAccessNested"
   which can be exploited here to save few function calls. Can this function
   interfere with other functions/access methods?


-- 

*Regards,.*
Wail Alkowaileet


Re: ASTERIXDB-1371: Support the standard GIS objects

2017-05-12 Thread Wail Alkowaileet
One small thing about GeoJSON ...

>From the JSON/ADM parser perspective, GeoJSON is still JSON.
"Disambiguating" or distinguishing between them might not be a
straightforward process. Getting GeoJSON parsed into AsterixDB internal
geometries would probably be the first step...

On Fri, May 12, 2017 at 11:15 PM, Mike Carey <dtab...@gmail.com> wrote:

> Just to chime in briefly on the "format" thread - there are two formats to
> keep in mind - the input format (serialized format, e.g., how JSON relates
> to the spatial types) and the internal format (which in AsterixDB is a
> different, more efficient, binary format).  We can look at both in the
> project. Also important are the OPERATIONS that go with the format (i.e.,
> the functions that we'll have in the query language for operating on,
> writing predicates about, etc., the spatial data).
>
> Cheers,
>
> Mike
>
>
>
> On 5/11/17 2:28 PM, Ahmed Eldawy wrote:
>
>> Hi Riyafa,
>>
>> I'm glad you started looking into the details. I agree with Mike that you
>> need to study the current geo support first. It will be highly desirable
>> for your work to be compatible with the current support so that we can
>> seamlessly unify the underlying code without disrupting the high-level
>> API.
>> As for the format, it would be nice to support both WKT and GeoJSON as
>> they
>> are both widely used. However, I think we should start with GeoJSON since
>> it is becoming more popular with modern devices, e.g., GPS, smart phones,
>> and IoT sensors. Later, we can support WKT as well. It will be a matter of
>> writing a different parse function.
>> Esri library [https://github.com/Esri/geometry-api-java] supports both
>> WKT
>> and GeoJSON. You can study it and see if we can use it in our project.
>>
>> Thanks
>> Ahmed
>>
>> On Wed, May 10, 2017 at 7:11 AM, Mike Carey <dtab...@gmail.com> wrote:
>>
>> I will leave it to the official GSC mentor (who's also a leading expert on
>>> big spatial data) to direct - I was just suggesting that step 0 should be
>>> to become familiar with what's already there currently, to have a working
>>> knowledge of that as background.
>>>
>>> :-)
>>>
>>> Looking forward to seeing this project unfold!
>>>
>>> Cheers,
>>>
>>> Mike
>>>
>>>
>>>
>>> On 5/9/17 10:14 PM, Riyafa Abdul Hameed wrote:
>>>
>>> Hi,
>>>>
>>>> As I understand by playing with current support of GIS objects( point,
>>>> polygon, circle, and rectangle) is similar to the Well known text
>>>> format--correct me if I am mistaken. Hence initially we could support
>>>> other
>>>> GIS objects in WKT and support GeoJSON if time permits.
>>>>
>>>> Thank you.
>>>> Yours sincerely,
>>>> Riyafa
>>>>
>>>> On 8 May 2017 at 23:31, Mike Carey <dtab...@gmail.com> wrote:
>>>>
>>>> I would also suggest playing with the current geo support in AsterixDB
>>>>
>>>>> (curretn types and indexing and functions in queries) to get warmed up.
>>>>> Welcome aboard...!!
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>> On 5/8/17 8:51 AM, Riyafa Abdul Hameed wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> I have been selected to contribute to the issue ASTERIXDB-1371
>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1371> for GSoC this
>>>>>> time.
>>>>>> This being the community bonding period I am trying to familiarize
>>>>>> myself
>>>>>> with the code base of AsterixDB and to get a grasp of the task.
>>>>>>
>>>>>> I am under the impression that the package *org.apache.asterix.om
>>>>>> <http://org.apache.asterix.om> *has the classes for handling data
>>>>>> models
>>>>>> for AsterixDB and have been looking into them to figure out the
>>>>>> implementation details. Please correct me if I am mistaken.
>>>>>>
>>>>>> I have also been reading on the specification for well known text[1]
>>>>>> and
>>>>>> GeoJSON[2] and have been trying to figure out if implementing one of
>>>>>> them
>>>>>> would suffice (if so which one) or if both needs to be implemented. If
>>>>>> both
>>>>>> needs to be implemented we should decide which needs to be implemented
>>>>>> first. I was thinking of going for GeoJSON as it seems to have a wider
>>>>>> usage.
>>>>>>
>>>>>> Any suggestions on how I should proceed with the project would be
>>>>>> highly
>>>>>> valued.
>>>>>>
>>>>>> [1] http://docs.opengeospatial.org/is/12-063r5/12-063r5.html
>>>>>> [2] https://tools.ietf.org/html/rfc7946
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Yours sincerely,
>>>>>> Riyafa
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>


-- 

*Regards,*
Wail Alkowaileet


Re: meet problem in compiling asterixdb

2017-04-24 Thread Wail Alkowaileet
Another thing, what's your Maven version ?

On Mon, Apr 24, 2017 at 8:02 PM, Ian Maxon <ima...@uci.edu> wrote:

> Which mvn goal are you running, and are you running it at the top level?
>
> On Sun, Apr 23, 2017 at 8:45 PM, zater <za...@vip.qq.com> wrote:
> > To developer:
> > I want to use maven to compile this  project. I meet the problem :
> >
> > [ERROR] Failed to parse plugin descriptor for
> > org.apache.hyracks:license-automation-plugin:0.3.1-SNAPSHOT
> > (C:\Users\zater\git\asterixdb\hyracks-fullstack\hyracks\
> hyracks-maven-plugins\license-automation-plugin\target\classes):
> > No plugin descriptor found at META-INF/maven/plugin.xml -> [Help 1]
> > [ERROR]
> > [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e
> > switch.
> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> > [ERROR]
> > [ERROR] For more information about the errors and possible solutions,
> please
> > read the following articles:
> > [ERROR] [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/
> PluginDescriptorParsingException
> >
> > I read this link and search google. Found remove the whole repository may
> > works. I do it and  do not work. I even try to remove the
> > "C:\Users\zater\git\asterixdb\hyracks-fullstack\hyracks\
> hyracks-maven-plugins\license-automation-plugin\target\classes",
> > also do not work. Can you give me some guidance on it? Thanks!
> > Zater
> > 2017/4/24
>



-- 

*Regards,*
Wail Alkowaileet


Re: Force LSM component flush & NC-CC messaging ACK

2017-01-21 Thread Wail Alkowaileet
I remember one reason to enforce flush is for Preglix connector [1][2][3].

For the messaging framework, I believe that you probably have the same
issue I had. I did what Till has suggested as it is guaranteed by the
robustness of AsterixDB and not the user who might kill the process anyway.

[1]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/java/org/apache/asterix/api/http/servlet/ConnectorAPIServlet.java
[2]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/java/org/apache/asterix/util/FlushDatasetUtils.java
[3]
https://github.com/apache/asterixdb/blob/2f9d4c3ab4d55598fe9a14fbf28faef12bed208b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/std/FlushDatasetOperatorDescriptor.java

On Sat, Jan 21, 2017 at 7:17 PM, Mike Carey <dtab...@gmail.com> wrote:

> I believe Ildar is just looking for a way to ensure, in doing experiments,
> that things are all in disk components.  His stats-gathering extensions
> camp on the LSM lifecycle - flushes in particular - and he wants to finish
> that process in his testing and experiments.  Wail's schema inference stuff
> has a similar flavor.  So the goal is to flush any lingering memory
> components to disk for a given dataset at the end of the "experiment
> lifecycle".
>
> We have DDL to compact a dataset - which flushes AND compacts - it might
> also be useful to have DDL to flush a dataset without also forcing
> compaction - as a way for an administrator to release that dataset's
> in-memory component related resources.  (Not that it's "necessary" for any
> correctness reason - just might be nice to be able to do that.  That could
> also be useful in scripting more user-level-oriented recovery tests.)
>
> Thus, I'd likely vote for adding a harmless new DDL statement - another
> arm of the one that supports compaction - for this.
>
> Cheers,
>
> Mike
>
>
>
> On 1/21/17 6:21 AM, Till Westmann wrote:
>
>> Hi Ildar,
>>
>> On 19 Jan 2017, at 4:02, Ildar Absalyamov wrote:
>>
>> Since I was out for quite a while and a lot of things happened in a
>>> meantime in a codebase I wanted to clarify couple of things.
>>>
>>> I was wondering if there is any legitimate way to force the data of
>>> in-memory components to be flushed, other then stop the whole instance?
>>> It used to be that choosing a different default dataverse with “use”
>>> statement did that trick, but that is not the case anymore.
>>>
>>
>> Just wondering, why do you want to flush the in-memory components to disk?
>>
>> Another question is regarding CC<->NC & NC<->NC messaging. Does the
>>> sender get some kind of ACK that the message was received by the addressee?
>>> Say if I send a message just before the instance shutdown will the shutdown
>>> hook wait until the message is delivered and processed?
>>>
>>
>> I agree with Murtadha, that I can certainly be done. However, we also
>> need to assume that some shutdowns won’t be clean and so the messages might
>> not be received. So it might be easier to just be able to recover from
>> missing messages than to be able to recover *and* to synchronize on
>> shutdown. Just a thought - maybe that’s not even an issue for your use-case.
>>
>> Cheers,
>> Till
>>
>
>


-- 

*Regards,*
Wail Alkowaileet


Re: Time of Multiple Joins in AsterixDB

2016-12-20 Thread Wail Alkowaileet
 "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config;,
> > "node_id": "red2",
> > "partitions": [{
> > "active": true,
> > "partition_id": "partition_14"
> > }],
> > "state": "ACTIVE",
> > "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats;,
> > "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/threaddump;
> > },
> > {
> > "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config;,
> > "node_id": "red4",
> > "partitions": [{
> > "active": true,
> > "partition_id": "partition_12"
> > }],
> > "state": "ACTIVE",
> > "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats;,
> > "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/threaddump;
> > },
> > {
> > "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config;,
> > "node_id": "red3",
> > "partitions": [{
> > "active": true,
> > "partition_id": "partition_13"
> > }],
> > "state": "ACTIVE",
> > "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats;,
> > "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/threaddump;
> > },
> > {
> > "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config;,
> > "node_id": "red9",
> > "partitions": [{
> > "active": true,
> > "partition_id": "partition_7"
> > }],
> > "state": "ACTIVE",
> > "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats;,
> > "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/threaddump;
> > }
> > ],
> > "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown;,
> > "state": "ACTIVE",
> > "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version;
> > }
> >
> > 2.Catalog_return:2.28G
> >
> > catalog_sales:31.01G
> >
> > inventory:8.63G
> >
> > 3.As for Pig and Hive, I always use the default configuration. I didn't
> set
> > the partition things for them. And for Spark, we use 200 partitions,
> which
> > may be improved and just not bad. For AsterixDB, I also set the cluster
> > using default value of partition and JVM things (I didn't manually set
> > these parameters).
> >
> >
> >
> > On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <buyin...@gmail.com> wrote:
> >
> > > Mingda,
> > >
> > >  1. Can you paste the returned JSON of http:// > > node>:19002/admin/cluster at your side? (Pls replace  with
> > the
> > > actual master node name or IP)
> > >  2. Can you list the individual size of each dataset involved in
> the
> > > query, e.g., catalog_returns, catalog_sales, and inventory?  (I assume
> > > 100GB is the overall size?)
> > >  3. Do Spark/Hive/Pig saturate all CPUs on all machines, i.e., how
> > many
> > > partitions are running on each machine?  (It seems that your AsterixDB
> > > configuration wouldn't saturate all CPUs for queries --- in the current
> > > AsterixDB master, the computation parallelism is set to be the same as
> > the
> > > storage parallelism (i.e., the number of iodevices on each NC). I've
> > > submitted a new patch that allow flexible computation parallelism,
> which
> > > should be able to get merged into master very soon.)
> > >  Thanks!
> > >
> > > Best,
> > > Yingyi
> > >
> > > On Tue, Dec 20, 2016 at 5:44 PM, mingda li <limingda1...@gmail.com>
> > wrote:
> > >
> > > > Oh, sure. When we test the 100G multiple join, we find AsterixDB is
> > > slower
> > > > than Spark (but still faster than Pig and Hive).
> > > > I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We
> will
> > > > only use 1-10G.eps in our paper).
> > > > And thanks for Ian's advice:* The dev list generally strips
> > attachments.
> > > > Maybe you can just put the config inline? Or link to a
> pastebin/gist?*
> > > > I know why you can't see the attachments. So I move the plots with
> two
> > > > documents to my Dropbox.
> > > > You can find the
> > > > 1-10G.eps here: https://www.dropbox.com/s/
> > rk3xg6gigsfcuyq/1-10G.eps?dl=0
> > > > 1-100G.eps here:https://www.dropbox.com/s/tyxnmt6ehau2ski/1-100G.eps
> ?
> > > dl=0
> > > > cc_conf.pdf here: https://www.dropbox.com/s/
> > y3of1s17qdstv5f/cc_conf.pdf?
> > > > dl=0
> > > > CompleteQuery.pdf here:
> > > > https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl=0
> > > >
> > > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <
> tcondie.u...@gmail.com>
> > > > wrote:
> > > >
> > > > > Mingda: Please also share the numbers for 100GB, which show
> AsterixDB
> > > not
> > > > > quite doing as well as Spark. These 100GB results will not be in
> our
> > > > > submission version, since they’re not needed for the desired
> message:
> > > > > picking the right join order matters. Nevertheless, I’d like to
> get a
> > > > > better understanding of what’s going on in the larger dataset
> regime.
> > > > >
> > > > >
> > > > >
> > > > > -Tyson
> > > > >
> > > > >
> > > > >
> > > > > From: Yingyi Bu [mailto:buyin...@gmail.com]
> > > > > Sent: Tuesday, December 20, 2016 4:30 PM
> > > > > To: dev@asterixdb.apache.org
> > > > > Cc: Michael Carey <mjca...@ics.uci.edu>; Tyson Condie <
> > > > > tcondie.u...@gmail.com>
> > > > > Subject: Re: Time of Multiple Joins in AsterixDB
> > > > >
> > > > >
> > > > >
> > > > > Hi Mingda,
> > > > >
> > > > >
> > > > >
> > > > >  It looks that you didn't attach the pdf?
> > > > >
> > > > >  Thanks!
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Yingyi
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li <limingda1...@gmail.com
> > > > > <mailto:limingda1...@gmail.com> > wrote:
> > > > >
> > > > > Sorry for the wrong version of cc.conf. I convert it to pdf version
> > as
> > > > > attachment.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li <limingda1...@gmail.com
> > > > > <mailto:limingda1...@gmail.com> > wrote:
> > > > >
> > > > > Dear all,
> > > > >
> > > > >
> > > > >
> > > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig)
> > multiple
> > > > > joins to see if there is a big difference with different join
> order.
> > > This
> > > > > is the reason for our research on multiple join and the result will
> > > > apppear
> > > > > in our paper which is to be submitted to VLDB soon. Could you help
> us
> > > to
> > > > > make sure that the test results make sense for AsterixDB?
> > > > >
> > > > >
> > > > >
> > > > > We configure the AsterixDB 0.8.9 ( use
> asterix-server-0.8.9-SNAPSHOT-
> > > > binary-assembly)
> > > > > in our cluster of 16 machines, each with a 3.40GHz i7 processor (4
> > > cores
> > > > > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk
> capacity.
> > > The
> > > > > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During
> > > > > configuration, I follow the NCService instruction here
> > > > > https://ci.apache.org/projects/asterixdb/ncservice.html. And I set
> > the
> > > > > cc.conf as in attachment. (Each node work as nc and the first node
> > also
> > > > > work as cc).
> > > > >
> > > > >
> > > > >
> > > > > For experiment, we use 3 fact tables from TPC-DS: inventory;
> > > > > catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g.
> > The
> > > > > multiple join query we use in AsterixDB are as following:
> > > > >
> > > > >
> > > > >
> > > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales
> > cs1
> > > > > JOIN catalog_returns cr1
> > > > >
> > > > >  ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
> > > > > cr1.cr_item_sk))  m1 JOIN inventory i1 ON i1.inv_item_sk =
> > > > cs1.cs_item_sk;
> > > > >
> > > > >
> > > > >
> > > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales
> cs1
> > > > JOIN
> > > > > inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN
> > > catalog_returns
> > > > > cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND
> cs1.cs_item_sk
> > =
> > > > > cr1.cr_item_sk);
> > > > >
> > > > >
> > > > >
> > > > > We load the data to AsterixDB firstly and run the two different
> > > queries.
> > > > > (The complete version of all queries for AsterixDB is in
> attachment)
> > > We
> > > > > assume the data has already been stored in AsterixDB and only count
> > the
> > > > > time for multiple join.
> > > > >
> > > > >
> > > > >
> > > > > Meanwhile, we use the same dataset and query to test Spark, Pig and
> > > Hive.
> > > > > The result is shown in the attachment's figure. And you can find
> > > > > AsterixDB's time is always better than others  no matter good or
> bad
> > > > > order:-) (BTW, the y scale of figure is time in log scale. You can
> > see
> > > > the
> > > > > time by the label of each bar.)
> > > > >
> > > > >
> > > > >
> > > > > Thanks for your help.
> > > > >
> > > > >
> > > > >
> > > > > Bests,
> > > > >
> > > > > Mingda
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 

*Regards,*
Wail Alkowaileet


Status

2016-11-04 Thread Wail Alkowaileet
Hi all,

Unfortunately I won't be able to attend this meeting. And here's my status:
- I was mainly working on the Tuple-Level compaction.
- helped MIT folks on exporting and query the data.
- Finally, there is an initial plan to incorporate Cloudberry in one of our
project (It's targeted for a city instead of a nation).

Thanks


Re: Orderedlist vs. unorderedlist as default open type

2016-11-03 Thread Wail Alkowaileet
My mistake... That's only for Twitter feed.

On Nov 3, 2016 14:11, "Wail Alkowaileet" <wael@gmail.com> wrote:

> Dears,
>
> Currently, unordered list is the default type of JSON array if it resides
> in the open part.
> That means the user won't be able to access any item of the list using
> index. Which is unexpected. At least for my colleagues who use AsterixDB. I
> think only JSON types should appear in the open part.
>
> Also, I believe there's inconsistency. When we do *group by .. with ..*
> the result of "with" clause is ordered list.
>
> Any thoughts ?
>
> --
>
> *Regards,*
> Wail Alkowaileet
>


Does Projection affect count() performance?

2016-10-01 Thread Wail Alkowaileet
Hi,

I know that early projections will enhance the performance.
I just noticed something:

1- returning the whole tuple
count( for $x in dataset Tweets
return $x
)

=> Throws an exception Java heap exceeded. (The heap-size is less than the
sum of AsterixDB configured memory ... so it's not a problem).

2- However, returning one field
count( for $x in dataset Tweets
return $x.id
)

=> Worked just fine.

I'm just wondering, does the projection in count() affects its performance ?
-- 

*Regards,*
Wail Alkowaileet


Tuple-level Compaction on AsterixDB (open-to-close type)

2016-09-23 Thread Wail Alkowaileet
Hi all,

In the previous week, I resumed working and testing the idea of having an
opportunistic module by exposing an API on Hyracks. Upper layers, then, can
implement those APIs to involve more on how to store the data.

I wrote the first draft of the design document. I apologize in front for
it's being too wordy.

Here's the link to the design document:
https://docs.google.com/document/d/1Rrhz2Kn9GLJ2OhPbmoHQjth85EA8JExSD7Xnjw7F5aA/edit?usp=sharing

Your feedback is highly appreciated.
-- 

*Regards,*
Wail Alkowaileet


Re: Creating RTree: no space left

2016-09-15 Thread Wail Alkowaileet
Hi Ahmed and Mike,

@Ahmed
I actually did a small experiment where I loaded about 1/5 of the data (so
I can index it) and seems that the R-Tree was really useful for querying
small regions or neighborhoods.
I also tried the B-Tree and it was slower than a full scan.

@Mike
Unfortunately, I cannot still even after anonymization :-)


On Wed, Sep 14, 2016 at 11:29 PM, Mike Carey <dtab...@gmail.com> wrote:

> Interesting point, so to speak.  @Wail, any chance you could post a Google
> maps screenshot showing a visualization of the points in this dataset on
> the underlying geographic region?  (If the dataset is shareable in that
> anonymized form?)  I would think an R-tree would still be good for
> small-region geo queries - possibly shrinking the candidate object set by a
> factor of 10,000 - so still useful - and we also do index-AND-ing now, so
> we would also combine that shrinkage by other index-provided shrinkage on
> any other index-amenable predicates.  I think the queries are still spatial
> in nature, and the only AsterixDB choices for that are R-tree.  (We did
> experiments with things like Hilbert B-trees, but the results led to the
> conclusion that the code base only needs R-trees for spatial data for the
> forseeable future - they just work too well and in a no-tuning-required
> fashion :-))
>
>
>
> On 9/14/16 12:49 PM, Ahmed Eldawy wrote:
>
>> Looks like an interesting case. Just a small question. Are you sure a
>> spatial index is the right one to use here? The spatial attribute looks
>> more like a categorization and a hash or B-tree index could be more
>> suitable. As far as I know, the spatial index in AsterixDB is a secondary
>> R-tree index which, like any other secondary index, is only good for
>> retrieving a small number of records. For this dataset, it seems that any
>> small range would still return a huge number of records.
>>
>> It is still interesting to further investigate and fix the sort issue but
>> I
>> mentioned the usage issue for a different perspective.
>>
>> Thanks
>> Ahmed
>>
>> On Wed, Sep 14, 2016 at 10:30 AM Mike Carey <dtab...@gmail.com> wrote:
>>
>> ☺!
>>>
>>> On Sep 14, 2016 1:11 AM, "Wail Alkowaileet" <wael@gmail.com> wrote:
>>>
>>> To be exact
>>>> I have 2,255,091,590 records and 10,391 points :-)
>>>>
>>>> On Wed, Sep 14, 2016 at 10:46 AM, Mike Carey <dtab...@gmail.com> wrote:
>>>>
>>>> Thx!  I knew I'd meant to "activate" the thought somehow, but couldn't
>>>>> remember having done it for sure.  Oops! Scattered from VLDB, I
>>>>>
>>>> guess...!
>>>
>>>>
>>>>>
>>>>> On 9/13/16 9:58 PM, Taewoo Kim wrote:
>>>>>
>>>>> @Mike: You filed an issue -
>>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1639. :-)
>>>>>>
>>>>>> Best,
>>>>>> Taewoo
>>>>>>
>>>>>> On Tue, Sep 13, 2016 at 9:28 PM, Mike Carey <dtab...@gmail.com>
>>>>>>
>>>>> wrote:
>>>
>>>> I can't remember (slight jetlag? :-)) if I shared back to this list
>>>>>>
>>>>> one
>>>
>>>> theory that came up in India when Wail and I talked F2F - his data
>>>>>>>
>>>>>> has
>>>
>>>> a
>>>>
>>>>> lot of duplicate points, so maybe something goes awry in that case.
>>>>>>>
>>>>>> I
>>>
>>>> wonder if we've sufficiently tested that case?  (E.g., what if there
>>>>>>>
>>>>>> are
>>>>
>>>>> gazillions of records originating from a small handful of points?)
>>>>>>>
>>>>>>>
>>>>>>> On 8/26/16 9:55 AM, Taewoo Kim wrote:
>>>>>>>
>>>>>>> Based on a rough calculation, per partition, each point field takes
>>>>>>>
>>>>>> 3.6GB
>>>>
>>>>> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we
>>>>>>>>
>>>>>>> are
>>>
>>>> generating 625 files (96MB or 128MB each) = 157GB. Since Wail
>>>>>>>>
>>>>>>> mentioned
>>>>
>>>>> that there was no issue when creating a B+ tree index, we need to
>>>>>>>>
>>>>>>> check
&

Indexing non-ADM data.

2016-09-02 Thread Wail Alkowaileet
Hi Dev,

In the last year or so I have been more involved in AsterixDB. However, I'm
90% user and 10% developer (due to the nature of my work). I want to share
some of my (and my colleagues) experience with ADM. However, I might be too
obvious.

One of the challenges we face most of the time is Indexing non-ADM data.
Most of the data are either in JSON or CSV format which mean all ADM
richness are not usable.

For instance in load, I usually create External (or Temporary) Dataset,
query/transform and then insert it to my Internal Dataset, which takes more
time compared with load, as a result of flush/merge operations.

Another challenging case, The TwitterFeed example
<https://ci.apache.org/projects/asterixdb/feeds/tutorial.html>, the
*longitude* and *latitude* fields are not indexable and I need to ETL to
another dataset to transform (lon,lat) to a point type*.*

It would be awesome if we can bridge non-ADM to ADM types.


-- 

*Regards,*
Wail Alkowaileet


Re: Modified/Custom plan: Push-down SELECT for external source.

2016-08-27 Thread Wail Alkowaileet
Cool.
Thanks Yingyi!

On Sat, Aug 27, 2016 at 1:58 AM, Yingyi Bu <buyin...@gmail.com> wrote:

> Currently you can push Project into the source but not Select.
> You're welcome to enhance IMetedataProvider to support that.  You can take
> a look at DataSourceScanPOperator:
>
> Pair<IOperatorDescriptor, AlgebricksPartitionConstraint> p =
> mp.getScannerRuntime(dataSource, vars,
> projectVars, scan.isProjectPushed(), scan.getMinFilterVars(),
> scan.getMaxFilterVars(), opSchema,
> typeEnv, context, builder.getJobSpec(), implConfig);
>
>
> Best,
>
> Yingyi
>
>
> On Fri, Aug 26, 2016 at 3:44 PM, Wail Alkowaileet <wael@gmail.com>
> wrote:
>
> > Hi AsterixDBers.
> >
> > Is there any easy way to push-down filter to an external source (in my
> case
> > Parquet) without being too intrusive ?
> >
> > This can perform way faster than STREAM-SELECT, as parquet can
> potentially
> > skip Row Groups while scanning.
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
> >
>



-- 

*Regards,*
Wail Alkowaileet


Re: Creating RTree: no space left

2016-08-26 Thread Wail Alkowaileet
@Jianfeng: Sorry for the stupid questio. But it seems that the logs and the
WebUI does not show the plan. Is there a flag for that?

@Taewoo: I'll look into it and see what's going on. AFAIK, the comparator
is Hilbert.

On Fri, Aug 26, 2016 at 7:55 PM, Taewoo Kim <wangs...@gmail.com> wrote:

> Based on a rough calculation, per partition, each point field takes 3.6GB
> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are
> generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned
> that there was no issue when creating a B+ tree index, we need to check
> what SORT process is required by R-Tree index.
>
> Best,
> Taewoo
>
> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <jianfeng@gmail.com>
> wrote:
>
> > If all of the file names start with “ExternalSortRunGenerator”, then they
> > are the first round files which can not be GCed.
> > Could you provide the query plan as well?
> >
> > > On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <wael@gmail.com>
> > wrote:
> > >
> > > Hi Ian and Pouria,
> > >
> > > The name of the files along with the sizes (there were 625 one of those
> > > before crashing):
> > >
> > > sizename
> > > 96MB ExternalSortRunGenerator8917133039835449370.waf
> > > 128MB   ExternalSortRunGenerator8948724728025392343.waf
> > >
> > > no files were generated beyond runs.
> > > compiler.sortmemory = 64MB
> > >
> > > Here is the full logs
> > > <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_
> > 25_07%3A34%3A52_AST_2016.zip?dl=0>
> > >
> > > On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <
> > pouria.pirza...@gmail.com>
> > > wrote:
> > >
> > >> We previously had issues with huge spilled sort temp files when
> creating
> > >> inverted index for fuzzy queries, but NOT R-Trees.
> > >> I also recall that Yingyi fixed the issue of delaying clean-up for
> > >> intermediate temp files until the end of the query execution.
> > >> If you can share names of a couple of temp files (and their sizes
> along
> > >> with the sort memory setting you have in asterix-configuration.xml) we
> > may
> > >> be able to have a better guess as if the sort is really going into a
> > >> two-level merge or not.
> > >>
> > >> Pouria
> > >>
> > >> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <ima...@uci.edu> wrote:
> > >>
> > >>> I think that execption ("No space left on device") is just casted
> from
> > >> the
> > >>> native IOException. Therefore I would be inclined to believe it's
> > >> genuinely
> > >>> out of space. I suppose the question is why the external sort is so
> > huge.
> > >>> What is the query plan? Maybe that will shed light on a possible
> cause.
> > >>>
> > >>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <
> wael@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> I was monitoring Inodes ... it didn't go beyond 1%.
> > >>>>
> > >>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <
> wael@gmail.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Chris and Mike,
> > >>>>>
> > >>>>> Actually I was monitoring it to see what's going on:
> > >>>>>
> > >>>>>   - The size of each partition is about 40GB (80GB in total per
> > >>>>>   iodevice).
> > >>>>>   - The runs took 157GB per iodevice (about 2x of the dataset
> size).
> > >>>>>   Each run takes either of 128MB or 96MB of storage.
> > >>>>>   - At a certain time, there were 522 runs.
> > >>>>>
> > >>>>> I even tried to create a BTree Index to see if that happens as
> well.
> > >> I
> > >>>>> created two BTree indexes one for the *location* and one for the
> > >>> *caller
> > >>>> *and
> > >>>>> they were created successfully. The sizes of the runs didn't take
> > >>> anyway
> > >>>>> near that.
> > >>>>>
> > >>>>> Logs are attached.
> > >>>>>
> > >>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <dtab...@gmail.com>
> > &g

Re: Creating RTree: no space left

2016-08-24 Thread Wail Alkowaileet
Hi Ian and Pouria,

The name of the files along with the sizes (there were 625 one of those
before crashing):

sizename
96MB ExternalSortRunGenerator8917133039835449370.waf
128MB   ExternalSortRunGenerator8948724728025392343.waf

no files were generated beyond runs.
compiler.sortmemory = 64MB

Here is the full logs
<https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_25_07%3A34%3A52_AST_2016.zip?dl=0>

On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <pouria.pirza...@gmail.com>
wrote:

> We previously had issues with huge spilled sort temp files when creating
> inverted index for fuzzy queries, but NOT R-Trees.
> I also recall that Yingyi fixed the issue of delaying clean-up for
> intermediate temp files until the end of the query execution.
> If you can share names of a couple of temp files (and their sizes along
> with the sort memory setting you have in asterix-configuration.xml) we may
> be able to have a better guess as if the sort is really going into a
> two-level merge or not.
>
> Pouria
>
> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <ima...@uci.edu> wrote:
>
> > I think that execption ("No space left on device") is just casted from
> the
> > native IOException. Therefore I would be inclined to believe it's
> genuinely
> > out of space. I suppose the question is why the external sort is so huge.
> > What is the query plan? Maybe that will shed light on a possible cause.
> >
> > On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <wael@gmail.com>
> > wrote:
> >
> > > I was monitoring Inodes ... it didn't go beyond 1%.
> > >
> > > On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <wael@gmail.com>
> > > wrote:
> > >
> > > > Hi Chris and Mike,
> > > >
> > > > Actually I was monitoring it to see what's going on:
> > > >
> > > >- The size of each partition is about 40GB (80GB in total per
> > > >iodevice).
> > > >- The runs took 157GB per iodevice (about 2x of the dataset size).
> > > >Each run takes either of 128MB or 96MB of storage.
> > > >- At a certain time, there were 522 runs.
> > > >
> > > > I even tried to create a BTree Index to see if that happens as well.
> I
> > > > created two BTree indexes one for the *location* and one for the
> > *caller
> > > *and
> > > > they were created successfully. The sizes of the runs didn't take
> > anyway
> > > > near that.
> > > >
> > > > Logs are attached.
> > > >
> > > > On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <dtab...@gmail.com>
> wrote:
> > > >
> > > >> I think we might have "file GC issues" - I vaguely remember that we
> > > don't
> > > >> (or at least didn't once upon a time) proactively remove unnecessary
> > run
> > > >> files - removing all of them at end-of-job instead of at the end of
> > the
> > > >> execution phase that uses their contents.  We may also have an
> "Amdahl
> > > >> problem" right now with our sort since we serialize phase two of
> > > parallel
> > > >> sorts - though this is not a query, it's index build, so that
> > shouldn't
> > > be
> > > >> it.  It would be interesting to put a df/sleep script on each of the
> > > nodes
> > > >> when this is happening - actually a script that monitors the temp
> file
> > > >> directory - and watch the lifecycle happen and the sizes change
> > > >>
> > > >>
> > > >>
> > > >> On 8/23/16 2:06 AM, Chris Hillery wrote:
> > > >>
> > > >>> When you get the "disk full" warning, do a quick "df -i" on the
> > device
> > > -
> > > >>> possibly you've run out of inodes even if the space isn't all used
> > up.
> > > >>> It's
> > > >>> unlikely because I don't think AsterixDB creates a bunch of small
> > > files,
> > > >>> but worth checking.
> > > >>>
> > > >>> If that's not it, then can you share the full exception and stack
> > > trace?
> > > >>>
> > > >>> Ceej
> > > >>> aka Chris Hillery
> > > >>>
> > > >>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <
> > wael@gmail.com>
> > > >>> wrote:
> > > >>>
> > > 

Re: Creating RTree: no space left

2016-08-23 Thread Wail Alkowaileet
Hi Chris and Mike,

Actually I was monitoring it to see what's going on:

   - The size of each partition is about 40GB (80GB in total per iodevice).
   - The runs took 157GB per iodevice (about 2x of the dataset size).  Each
   run takes either of 128MB or 96MB of storage.
   - At a certain time, there were 522 runs.

I even tried to create a BTree Index to see if that happens as well. I
created two BTree indexes one for the *location* and one for the *caller *and
they were created successfully. The sizes of the runs didn't take anyway
near that.

Logs are attached.

On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <dtab...@gmail.com> wrote:

> I think we might have "file GC issues" - I vaguely remember that we don't
> (or at least didn't once upon a time) proactively remove unnecessary run
> files - removing all of them at end-of-job instead of at the end of the
> execution phase that uses their contents.  We may also have an "Amdahl
> problem" right now with our sort since we serialize phase two of parallel
> sorts - though this is not a query, it's index build, so that shouldn't be
> it.  It would be interesting to put a df/sleep script on each of the nodes
> when this is happening - actually a script that monitors the temp file
> directory - and watch the lifecycle happen and the sizes change
>
>
>
> On 8/23/16 2:06 AM, Chris Hillery wrote:
>
>> When you get the "disk full" warning, do a quick "df -i" on the device -
>> possibly you've run out of inodes even if the space isn't all used up.
>> It's
>> unlikely because I don't think AsterixDB creates a bunch of small files,
>> but worth checking.
>>
>> If that's not it, then can you share the full exception and stack trace?
>>
>> Ceej
>> aka Chris Hillery
>>
>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <wael@gmail.com>
>> wrote:
>>
>> I just cleared the hard drives to get 80% free space. I still get the same
>>> issue.
>>>
>>> The data contains:
>>> 1- 2887453794 records.
>>> 2- Schema:
>>>
>>> create type CDRType as {
>>>
>>> id:uuid,
>>>
>>> 'date':string,
>>>
>>> 'time':string,
>>>
>>> 'duration':int64,
>>>
>>> 'caller':int64,
>>>
>>> 'callee':int64,
>>>
>>> location:point?
>>>
>>> }
>>>
>>>
>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <wael@gmail.com>
>>> wrote:
>>>
>>> Dears,
>>>>
>>>> I have a dataset of size 290GB loaded in a 3 NCs each of which has
>>>>
>>> 2x500GB
>>>
>>>> SSD.
>>>>
>>>> Each of NC has two IODevices (partitions) in each hard drive (i.e the
>>>> total is 4 iodevices per NC). After loading the data, each Asterix
>>>> partition occupied 31GB.
>>>>
>>>> The cluster has about 50% free space in each hard drive (approximately
>>>> about 250GB free space in each hard drive). However, when I tried to
>>>>
>>> create
>>>
>>>> an index of type RTree, I got an exception that no space left in the
>>>> hard
>>>> drive during the External Sort phase.
>>>>
>>>> Is that normal ?
>>>>
>>>>
>>>> --
>>>>
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>>
>>>>
>>>
>>> --
>>>
>>> *Regards,*
>>> Wail Alkowaileet
>>>
>>>
>


-- 

*Regards,*
Wail Alkowaileet
org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
HYR0002: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device

	at org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:212)
	at org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0002: org.apache.hyracks.api.exceptions.HyracksDataException: No space left on device
	at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
	at org.apache.hyracks.control.nc.Task.run(Task.java:319)
	... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: No s

Re: Creating RTree: no space left

2016-08-23 Thread Wail Alkowaileet
I just cleared the hard drives to get 80% free space. I still get the same
issue.

The data contains:
1- 2887453794 records.
2- Schema:

create type CDRType as {

id:uuid,

'date':string,

'time':string,

'duration':int64,

'caller':int64,

'callee':int64,

location:point?

}


On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <wael@gmail.com>
wrote:

> Dears,
>
> I have a dataset of size 290GB loaded in a 3 NCs each of which has 2x500GB
> SSD.
>
> Each of NC has two IODevices (partitions) in each hard drive (i.e the
> total is 4 iodevices per NC). After loading the data, each Asterix
> partition occupied 31GB.
>
> The cluster has about 50% free space in each hard drive (approximately
> about 250GB free space in each hard drive). However, when I tried to create
> an index of type RTree, I got an exception that no space left in the hard
> drive during the External Sort phase.
>
> Is that normal ?
>
>
> --
>
> *Regards,*
> Wail Alkowaileet
>



-- 

*Regards,*
Wail Alkowaileet


Creating RTree: no space left

2016-08-23 Thread Wail Alkowaileet
Dears,

I have a dataset of size 290GB loaded in a 3 NCs each of which has 2x500GB
SSD.

Each of NC has two IODevices (partitions) in each hard drive (i.e the total
is 4 iodevices per NC). After loading the data, each Asterix partition
occupied 31GB.

The cluster has about 50% free space in each hard drive (approximately
about 250GB free space in each hard drive). However, when I tried to create
an index of type RTree, I got an exception that no space left in the hard
drive during the External Sort phase.

Is that normal ?


-- 

*Regards,*
Wail Alkowaileet


Re: Trio: AsterixDB, Spark and Zeppelin.

2016-08-03 Thread Wail Alkowaileet
One more thing:
Can you paste your cluster configuration as well?

Thanks

On Wed, Aug 3, 2016 at 12:32 PM, Wail Alkowaileet <wael@gmail.com>
wrote:

> Hi Kevin,
>
> Thanks for testing it! I really appreciate it.
> Definitely I tested it on my network (KACST) and I just tried to reproduce
> the same problem you have at MIT network. It seems that I didn't get the
> same problem.
>
> Can you paste the full logs? I just want to know if the connector got the
> ResultLocations correctly ?
>
>
> Thanks again :-)
>
> On Tue, Aug 2, 2016 at 4:25 PM, Coakley, Kevin <kcoak...@sdsc.edu> wrote:
>
>> Hi Wail,
>>
>> I was able to get the asterixdb-spark-connector to work as long as
>> asterixdb, zeppelin and spark are all running on the same server.
>>
>> When I try to access the asterixdb on a remote server, I receive the
>> org.apache.hyracks.api.exceptions.HyracksDataException: Connection fail
>> error at the bottom of this email.
>>
>> I don’t believe there are any firewalls between the two systems so I am
>> unsure why I am receiving a connection failure. I looked at the hyracks
>> documentation at
>> https://github.com/apache/asterixdb/tree/master/hyracks-fullstack/hyracks/hyracks-documentation/src/books/user-guide
>> it didn’t mention anything about how to access hyracks remotely. I couldn’t
>> find any additional documentation by searching Google.
>>
>>
>> $ /opt/spark/bin/spark-shell --packages
>> org.apache.asterix:asterixdb-spark-connector_2.10:1.6.0 --conf
>> spark.asterix.connection.host=10.128.5.192 --conf
>> spark.asterix.connection.port=19002 --conf spark.asterix.frame.size=131072
>>
>> …
>>
>> scala>   rddAql.collect().foreach(println)
>> 16/08/02 20:18:49 DEBUG ClosureCleaner: +++ Cleaning closure 
>> (org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12) +++
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + declared fields: 2
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  public static final long
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.serialVersionUID
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  private final
>> org.apache.spark.rdd.RDD$$anonfun$collect$1
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$1
>> 2.$outer
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + declared methods: 2
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  public final
>> java.lang.Object
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(java.lang.Object)
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  public final
>> java.lang.Object
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(scala.collection.Ite
>> rator)
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + inner classes: 0
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + outer classes: 2
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:
>> org.apache.spark.rdd.RDD$$anonfun$collect$1
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  org.apache.spark.rdd.RDD
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + outer objects: 2
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  AsterixRDD[0] at RDD at
>> AsterixRDD.scala:38
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + populating accessed fields
>> because this is the starting closure
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + fields accessed by starting
>> closure: 2
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  (class
>> org.apache.spark.rdd.RDD$$anonfun$collect$1,Set($outer))
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  (class
>> org.apache.spark.rdd.RDD,Set(org$apache$spark$rdd$RDD$$evidence$1))
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + outermost object is not a
>> closure, so do not clone it: (class org.apache.spark.rdd.RDD,AsterixRDD[0]
>> at RDD at Ast
>> erixRDD.scala:38)
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + cloning the object 
>> of class org.apache.spark.rdd.RDD$$anonfun$collect$1
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + cleaning cloned closure
>>  recursively (org.apache.spark.rdd.RDD$$anonfun$collect$1)
>> 16/08/02 20:18:49 DEBUG ClosureCleaner: +++ Cleaning closure 
>> (org.apache.spark.rdd.RDD$$anonfun$collect$1) +++
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  + declared fields: 2
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  public static final long
>> org.apache.spark.rdd.RDD$$anonfun$collect$1.serialVersionUID
>> 16/08/02 20:18:49 DEBUG ClosureCleaner:  private final
>> org.apache.spark.rdd.RDD org.apache.spark.rdd.RDD$$anonfun$collect$1.$outer
>> 16/08/02 20:18:49 DEBUG ClosureCleaner

Re: Trio: AsterixDB, Spark and Zeppelin.

2016-07-26 Thread Wail Alkowaileet
Hi Ildar,

I added screenshots on how to configure Zeppelin with the connector.
https://github.com/Nullification/asterixdb-spark-connector

Please let me know how that goes.



On Tue, Jul 26, 2016 at 3:01 AM, Ildar Absalyamov <
ildar.absalya...@gmail.com> wrote:

> Hi Wail,
>
> I have tried to execute the example from demo notebook in zeppelin but it
> seems it cannot see asterix-spark-connector:
> :34: error: object asterix is not a member of package org.apache
>import org.apache.asterix.connector._
> How to make zeppelin aware of the connector? I have downloaded and built
> it with "sbt package && sbt assembly && sbt publish". Is there any other
> configuration on zeppelin side that I should do?
>
> > On Jul 18, 2016, at 05:26, Wail Alkowaileet <wael@gmail.com> wrote:
> >
> > Sorry. Here's the link for the connecot:
> > https://github.com/Nullification/asterixdb-spark-connector <
> https://github.com/Nullification/asterixdb-spark-connector>
> >
> > On Mon, Jul 18, 2016 at 2:34 PM, Wail Alkowaileet <wael@gmail.com
> <mailto:wael@gmail.com>>
> > wrote:
> >
> >> Dears,
> >>
> >> Finally I finished cleaning and documenting the AsterixDB-Spark
> connector
> >> and finalize Zeppelin <http://zeppelin.apache.org <
> http://zeppelin.apache.org/>> interpreter for AQL
> >> and SQL++.
> >>
> >> AsterixDB-Spark Connector:
> >>
> >>   - Supports both AQL and SQL++ queries.
> >>   - Much cleaner code now.
> >>   - Please if you have ANY problem with it, create an issue in the
> >>   project repo.
> >>   - I'm working on a tutorial-video from the build to use it in
> Zeppelin.
> >>   - I recommend you to use Zeppelin. (you can import the connector
> >>   example notebook
> >>   <
> https://github.com/Nullification/asterixdb-spark-connector/tree/master/zeppelin-notebook/asterixdb-spark-example
> <
> https://github.com/Nullification/asterixdb-spark-connector/tree/master/zeppelin-notebook/asterixdb-spark-example
> >>
> >>   )
> >>
> >> Source Code: https://github.com/Nullification/astreixdb-spark-connector
> >>
> >> Apache Zeppelin with AsterixDB interpreter:
> >>
> >>   - Supports JSON-flattening (which will allow zeppelin to visualize
> >>   results).
> >>   - See attached screenshots.
> >>   - Will try to initiate pull request to merge it to Zeppelin master.
> >>
> >> Source Code: https://github.com/Nullification/zeppelin
> >>
> >> Finally, I just submitted Schema Inferencer. I have work on some Sonar
> >> comments and it should be ready soon.
> >>
> >> Thanks!
> >>
> >> --
> >>
> >> *Regards,*
> >> Wail Alkowaileet
> >>
> >
> >
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
>
> Best regards,
> Ildar
>
>


-- 

*Regards,*
Wail Alkowaileet


Re: ADM parser question

2016-06-01 Thread Wail Alkowaileet
The file looks weird to me ... why everything is string-fied ?

On Thu, Jun 2, 2016 at 4:30 AM, Ian Maxon <ima...@uci.edu> wrote:

> Also It seems like the line # was wrong somehow, or at least it was not
> leading to the right part of the file. I was just stepping through the
> lexer in the debugger, and I saw that it was failing on "id" :
> "728286376236593152", which is line 310. Line 619 has no problems, nor do
> any of the lines adjacent to it.
>
> On Wed, Jun 1, 2016 at 6:21 PM, Ian Maxon <ima...@uci.edu> wrote:
>
> > Aha! Think I found it. The regular expression for the decimal replacement
> > was just deficient in the case that the field was something besides 0.0
> :)
> > It should be [0-9]*.
> >
> > On Wed, Jun 1, 2016 at 12:30 PM, Ian Maxon <ima...@uci.edu> wrote:
> >
> >> I just did something as minimal/open as possible, like:
> >>
> >> "create type Tweet as open {id: string}"
> >>
> >> I'm not actually sure what the original type was.
> >>
> >> On Wed, Jun 1, 2016 at 12:21 PM, abdullah alamoudi <bamou...@gmail.com>
> >> wrote:
> >>
> >>> Ian,
> >>> Can you share the data type? I am trying to re produce this
> >>>
> >>> ~Abdullah.
> >>>
> >>> On Wed, Jun 1, 2016 at 8:49 AM, Ian Maxon <ima...@uci.edu> wrote:
> >>>
> >>> > Oh, I forgot the list strips attachments. Here's the snippet of the
> >>> data
> >>> > that's being troublesome:
> >>> >
> >>> >
> >>>
> https://drive.google.com/file/d/0B9fobkjZFASiRXAybS1BUXZvR1V6akE3VlhGTkVFU2ZkYzlB/view?usp=sharing
> >>> >
> >>> > On Tue, May 31, 2016 at 10:36 PM, Mike Carey <dtab...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > We desperately need to make roundtripping work!!
> >>> > >
> >>> > >
> >>> > >
> >>> > > On 5/31/16 7:52 PM, Ian Maxon wrote:
> >>> > >
> >>> > >> Hi all,
> >>> > >>
> >>> > >> I have a question about something I am trying to coax the ADM
> parser
> >>> > into
> >>> > >> accepting. I have a file that I dumped from the SDSC testbed that
> >>> has a
> >>> > >> bunch of tweets in it, just using curl and a dataset scan. The
> >>> issue is
> >>> > >> that currently this doesn't work round-trip. However in this case
> >>> the
> >>> > >> modifications don't seem like they should be terribly severe, so I
> >>> just
> >>> > >> tried my hand at using sed to fix it. The two things I think that
> >>> should
> >>> > >> make this hack work are: replacing the i32/i64 suffixes (so just
> >>> > s/i32//g)
> >>> > >> and removing decimal suffixes (/s/\([0-9]\.[0-9]\)d/\1/g). This
> >>> gives
> >>> > >> output to me, that seems like it is "correct". But the parser is
> >>> still
> >>> > >> complaining and I don't understand why. It fails at line 619,
> column
> >>> > 228.
> >>> > >> The tweet on that line, and the one above it, work fine if I just
> >>> use an
> >>> > >> insert statement.
> >>> > >>
> >>> > >> Does anyone have any thoughts as to maybe what's causing it to not
> >>> take
> >>> > >> this input? I'm hoping it's just something silly I am too tired to
> >>> > see...
> >>> > >> Thanks in advance for any thoughts/suggestions.
> >>> > >>
> >>> > >> -Ian
> >>> > >>
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>



-- 

*Regards,*
Wail Alkowaileet