Re: Timestamps and Cardinality in Queries

2017-03-02 Thread Liu, Eric
Turns out all of these issues were related to our firewall and internal maven 
settings haha. Building on my personal computer worked out fine. Thanks for the 
help though!

On 3/1/17, 5:35 PM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> wrote:

 The mongo tests download and run a specially package version of Mongo.
It seems like it's  having difficulty downloading.Can you hit the URL
for mongo?

Could not open inputStream for
http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.1.tgz


On Wed, Mar 1, 2017 at 6:04 PM Liu, Eric <eric@capitalone.com> wrote:

> Hm, maven runs now, but it’s getting this error in the Mongo tests:
> http://pastebin.com/Mt928ane
>
> On 3/1/17, 12:30 PM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> wrote:
>
> That's really strange.  Can you hit the maven central repo [1] from
> your
> machine?
>
> I guess delete the locationtech  definition from your pom?
>
>
> [1] http://repo1.maven.org/maven2/org/apache/apache/17/
>
> On Wed, Mar 1, 2017 at 2:31 PM Liu, Eric <eric@capitalone.com>
> wrote:
>
> > Hmmm, deleting the files in .m2 doesn’t stop it from searching in
> > locationtech, and using the other mvn command gives me no log 
output.
> >
> > On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <aaron.miha...@gmail.com>
> wrote:
> >
> > transversing: gotcha.  I completely understand now.  And now I
> > understand
> > how the prospector table would help with sniping out those 
nodes.
> >
> > maven: yep, that's the right git repo.  Locationtech is required
> when
> > you
> > build with the 'geoindexing' profile.  Regardless, it's strange
> that
> > maven
> > tried to get the apache pom from locationtech.  Deleting the
> > org/apache/apache directory should force maven to download the
> apache
> > pom
> > from maven central.
> >
> > --Aaron
> >
> > On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <
> eric@capitalone.com>
> > wrote:
> >
> > > Oh, that’s not an issue, that’s what we would like to do when
> > traversing
> > > through the data. If a node has a high cardinality we don’t
> want to
> > further
> > > traverse through its children.
> > >
> > > As for installation, did I clone the right repo for Rya? The
> one I’m
> > using
> > > has locationtech repos for SNAPSHOT and RELEASE:
> > > https://github.com/apache/incubator-rya/blob/master/pom.xml
> > >
> > > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <
> aaron.miha...@gmail.com>
> > wrote:
> > >
> > > Repos: The locationtech repo is up [1].  The issue is that
> your
> > local
> > > .m2
> > > repo is in a bad state.  Maven is trying to get the apache
> pom
> > from
> > > locationtech.  Locationtech does not host that pom,
> instead it's
> > on
> > > maven
> > > central [2].
> > >
> > > Two ways to fix this issue (you should do (1) and that'll
> fix
> > it...
> > > (2) is
> > > just another option for reference).
> > >
> > > 1. Delete your apache pom directory from your local maven
> repo
> > (e.g.
> > > rm -rf
> > > ~/.m2/repository/org/apache/apache/)
> > >
> > > 2. Tell maven to ignore remote repository metadata with
> the -llr
> > flag
> > > (e.g.
> > > mvn clean install -llr -Pgeoindexing)
> > >
> > > Let me know if you have any other issues.
> > >
> > > deep/wide: okay, I don't understand this statement: "if 
the
> > > cardinality of
> > > a node is too high (for example, a user that owns a large
> number
> > of

Re: Timestamps and Cardinality in Queries

2017-03-01 Thread Liu, Eric
Hey Aaron,

I’m currently setting up Rya to test these queries with some of our data. I run 
into an error when I run ‘mvn clean install’, I attached the logs but it seems 
like I can’t connect to the snapshots repo you’re using.

As for “deep/wide”, it would be something like starting at a dataset, then 
fanning out looking for relations where it is either the subject or object, 
such as the user who created it, the job it came from, where it’s stored, etc. 
It would recurse on these neighboring nodes until a total number of results is 
reached. However, if the cardinality of a node is too high (for example, a user 
that owns a large number of datasets), the neighbors of that node will not be 
found. Really, the goal is to find the most distance relevant relationships 
possible, and this is our current naïve way of doing so.

Do you want to have a short call about this? I think it’d be easier to 
explain/answer questions over the phone. I’m free pretty much any time 1pm-5pm 
PST tomorrow (3/1).

Thanks,
Eric

On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> wrote:

deep vs wide: I played around with the property paths sparql operator and
put up an example here [1].  This is a slightly different query than the
one I sent out before.  It would be worth it for us to look at how this is
actually executed by OpenRDF.

Eric: Could you clarify by "deep vs wide"?  I think I understand your
queries, but I don't have a good intuition about those terms and how
cardinality might figure into a query.  It would probably be a bit more
helpful if you provided a model or general description that is (somewhat)
representative of your data.

--Aaron

[1]

https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java

On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <ad...@usna.edu> wrote:

> Hi Eric,
>
> If you want to query by the Accumulo timestamp, something like
> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not try
> it lately, but timeRange() was in Rya originally. Not sure if it was
> removed in later iterations or whether it would be useful for your use
> case. First Rya paper
> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf discusses
> time ranges (Section 5.3 at the link above)
>
> Adina
>
> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <puja...@gmail.com> wrote:
>
> > Hey John,
> > I'm pretty sure your pull request was merged-- it was pulled in through
> > another pull request.  If not, sorry-- I thought it had been merged and
> > then just not closed.  I was going to spend some time doing merges
> tomorrow
> > so I can get it tomorrow.
> >
> > Sent from my iPhone
> >
> > > On Feb 23, 2017, at 8:13 PM, John Smith <johns0...@gmail.com> wrote:
> > >
> > > I have a pull request that fixes that problem.. it has been stuck in
> > limbo
> > > for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
> > > someone merge it into master?
> > >
> > >> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <eric@capitalone.com>
> > wrote:
> > >>
> > >> Cool, thanks for the help.
> > >> By the way, the link to the Rya Manual is outdated on the
> > rya.apache.org
> > >> site. Should be pointing at https://github.com/apache/
> > >> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
> index.md
> > >>
> > >> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aaron.miha...@gmail.com>
> > wrote:
> > >>
> > >>deep vs wide:
> > >>
> > >>A property path query is probably your best bet.  Something like:
> > >>
> > >>for the following data:
> > >>
> > >>s:EventA p:causes s:EventB
> > >>s:EventB p:causes s:EventC
> > >>s:EventC p:causes s:EventD
> > >>
> > >>
> > >>This query would start at EventB and work it's way up and down the
> > >> chain:
> > >>
> > >>SELECT * WHERE {
> > >>(|^)* ?s . ?s ?p ?o
> > >>}
> > >>
> > >>
> > >>On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
> > caleb.me...@parsons.com>
> > >>wrote:

Re: Timestamps and Cardinality in Queries

2017-02-23 Thread Liu, Eric
Cool, thanks for the help.
By the way, the link to the Rya Manual is outdated on the rya.apache.org site. 
Should be pointing at 
https://github.com/apache/incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_index.md

On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aaron.miha...@gmail.com> wrote:

deep vs wide:

A property path query is probably your best bet.  Something like:

for the following data:

s:EventA p:causes s:EventB
s:EventB p:causes s:EventC
s:EventC p:causes s:EventD


This query would start at EventB and work it's way up and down the chain:

SELECT * WHERE {
(|^)* ?s . ?s ?p ?o
}


On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <caleb.me...@parsons.com>
wrote:

> Yes, that's a good place to start.  If you have external timestamps that
> are built into your graph using the time ontology in owl (e.g you have
> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)), the
> temporal index is exactly what you want.  If you are hoping to query based
> on the internal timestamps that Accumulo assigns to your triples, then
> there are some slight tweaks that can be done to facilitate this, but it
> won't be nearly as efficient (this will require some sort of client side
> filtering).
>
> Caleb A. Meier, Ph.D.
> Software Engineer II ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066 <(703)%20797-3066>
> caleb.me...@parsons.com ♦ www.parsons.com
>
> -Original Message-
> From: Liu, Eric [mailto:eric@capitalone.com]
> Sent: Thursday, February 23, 2017 2:27 PM
> To: dev@rya.incubator.apache.org
> Subject: Re: Timestamps and Cardinality in Queries
>
> We’d like to be able to query by timestamp; specifically, we want to be
> able to find all statements that were made within a given time range. Is
> this what I should be looking at?
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_63407907_Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-3D1464789502000-26api-3Dv2=CwIGaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw=vLayAkLG0IKGE-0NbwRQKfpcfId05fXE5TX8oMJaa7Q=
>
>
>
> On 2/22/17, 6:21 PM, "Meier, Caleb" <caleb.me...@parsons.com> wrote:
>
>
>
> Hey Eric,
>
>
>
> Currently timestamps can't be queried in Rya.  Do you need to be able
> to query by timestamp, or simply discover the timestamp for a given node?
> Rya does have a temporal index, but that requires you to use a temporal
> ontology to model the temporal properties of your graph nodes.
>
> 
>
> From: Liu, Eric <eric@capitalone.com>
>
> Sent: Wednesday, February 22, 2017 6:38 PM
>
> To: dev@rya.incubator.apache.org
>
> Subject: Timestamps and Cardinality in Queries
>
>
>
> Hi,
>
>
>
> Continuing from our talk earlier today I was wondering if you could
> provide more information about how timestamps could be queried in Rya.
>
> Also, we are trying to support a type of query that would essentially
> be limiting on cardinality (different from the normal SPARQL limit because
> it’s for node cardinality rather than total results). I saw in one of
> Caleb’s talks that Rya’s query optimization involves checking cardinality
> first. I was wondering if there would be some way to tap into this feature
> for usage in queries?
>
>
>
> Thanks,
>
> Eric Liu
>
> 
>
>
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the 
intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete 

Re: Timestamps and Cardinality in Queries

2017-02-23 Thread Liu, Eric
We’d like to be able to query by timestamp; specifically, we want to be able to 
find all statements that were made within a given time range. Is this what I 
should be looking at? 
https://cwiki.apache.org/confluence/download/attachments/63407907/Rya%20Temporal%20Indexing.pdf?version=1=1464789502000=v2

On 2/22/17, 6:21 PM, "Meier, Caleb" <caleb.me...@parsons.com> wrote:

Hey Eric,

Currently timestamps can't be queried in Rya.  Do you need to be able to 
query by timestamp, or simply discover the timestamp for a given node?  Rya 
does have a temporal index, but that requires you to use a temporal ontology to 
model the temporal properties of your graph nodes.
________
    From: Liu, Eric <eric@capitalone.com>
Sent: Wednesday, February 22, 2017 6:38 PM
To: dev@rya.incubator.apache.org
Subject: Timestamps and Cardinality in Queries

Hi,

Continuing from our talk earlier today I was wondering if you could provide 
more information about how timestamps could be queried in Rya.
Also, we are trying to support a type of query that would essentially be 
limiting on cardinality (different from the normal SPARQL limit because it’s 
for node cardinality rather than total results). I saw in one of Caleb’s talks 
that Rya’s query optimization involves checking cardinality first. I was 
wondering if there would be some way to tap into this feature for usage in 
queries?

Thanks,
Eric Liu


The information contained in this e-mail is confidential and/or proprietary 
to Capital One and/or its affiliates and may only be used solely in performance 
of work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.





The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.