Re: [Neo4j] Will there ever be a native SDK for Node.js?

2011-12-01 Thread Tero Paananen
 With REST, you have a separate server. Thus multiple applications
 can use. Or you can access it directly through normal web browser
 to fix your data.
 Or even run background jobs against that server. All that is
 not possible with native.

I wouldn't say that. It's not possible using Neo4J APIs
directly, but with Spring Data for Neo4J and the rest of
the Spring framework (as an example) it's pretty damn
easy to provide remote access to a Neo4J data store.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Will there ever be a native SDK for Node.js?

2011-12-01 Thread Tero Paananen
On 12/1/2011 10:47 PM, Dmytrii Nagirniak wrote:
 
 On 01/12/2011, at 11:34 PM, Tero Paananen wrote:
 
 Spring Data for Neo4J and the rest of
 the Spring framework (as an example) it's pretty damn
 easy to provide remote access to a Neo4J data store.
 
 Are you saying that your WEB application should become a database server?

No. I'm saying that you have options that will allow
you to easily adapt your solution to whatever your
needs are without putting everything and the kitchen
sink into the database server.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Cypher Query Optimizing

2011-11-30 Thread Tero Paananen
 Cypher:

 START n=node:words(w_id = 137) MATCH n-[:CO_S]-m, n-[:CO_S]- t,
 m-[r:CO_S]- t return m.w_id, t.w_id, r.sig, r.freq


 The results are the same, but the Cypher Query is about 10 times slower
 than the SQL pendant. I currently do not use any additional indices.
 Just  a map (words) between my word ID and the neo4j node id.

Doing index lookups is much slower than node lookups.

In the application I'm developing, we've removed most index lookups
and replaced them with node lookups. We keep a keyword to nodeId
mapping in a key/value store and lookup the nodeId before running any
Cypher queries.

In your case:

START n=node(1) MATCH n-[:CO_S]-m, n-[:CO_S]- t,
m-[r:CO_S]- t return m.w_id, t.w_id, r.sig, r.freq

(where 1 is the corresponding nodeId to w_id = 137)

If I understood your email correctly, you already have that map available
to you (137 - 1). I'd use that to see if it's any quicker.

Your mileage may vary, of course. In our application the speed improvements
were roughly 10x. See my post on the mailing list with subject:

Comparing Lucene index lookup performance to lookup by node id

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Cypher query question

2011-11-22 Thread Tero Paananen
 Is it possible to filter the results to skip the start node?

 For example, using the koans dataset, find the enemy of my enemy (should be
 my friends :) )

 START n=node(1)
 MATCH (n)-[:ENEMY_OF]-(x)-[:ENEMY_OF]-(z)
 return distinct z

 I understand that The doctor node 1, should appear as it is indeed an
 enemy of one of his enemies, but is it possible to filter?

This may or may not be the best way to do this, but I have this sort
of stuff in some of my queries for the same purpose:

... WHERE ID(n) != ID(z) ...

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Bug (or feature?) in using regular expressions in Cypher where clauses

2011-11-17 Thread Tero Paananen
This is using 1.5 GA.

I think I found a bug (or unsupported feature) in how regular
expressions used in where clauses work in Cypher queries.

It doesn't look like having multiple regular expression conditions
in a where clause back to back works too well.

Here's an example that works:

start n = node(1) match (n)-[:KNOWS]-(k) where k.a =~ /foo.*/ return k

Returns all connected nodes where property a starts with foo.


Here's an example that doesn't work:

start n = node(1) match (n)-[:KNOWS]-(k) where k.a =~ /foo.*/ or k.b
=~ /foo.*/ return k

This will always return an empty result. So will if I change or to and.


Here's an example that does work:

start n = node(1) match (n)-[:KNOWS]-(k) where k.a =~ /foo.*/ or k.a
= foo or k.b =~ /foo.*/ return k

The introduction of a non-regular expression condition makes the
query work as intended.


This one does not work:

start n = node(1) match (n)-[:KNOWS]-(k) where k.a =~ /foo.*/ or k.b
=~ /foo.*/ or k.a = foo return k

Looks like a bug to me, and seems to be caused by where clauses
with more than one regular expression conditions next to each other.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] How do I use parameters in Cypher queries using regular expressions?

2011-11-17 Thread Tero Paananen
Another regular expression related issue in Cypher. 1.5 GA.

I'm using repositories extensively in my application. I'm extending the
queries defined by the repository interfaces with my own using the @Query
annotation.

It seems as if using parametrized regular expressions in these queries
isn't working as I'd expect them to.

Here's an example:

@Query(value = start n = node(1) match (n)-[:KNOWS]-(c) where c.a =~
/.*?{foo}.*?/ return c, type = QueryType.Cypher)
PageFoo getConnectedNodes(@Param(foo) String foo, Pageable pageable);


It looks like the stuff inside {} is parsed as part of the regular expression.

I tried it with %foo as well.

Looks like parameters inside regular expressions aren't being replaced
at all.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How do I use parameters in Cypher queries using regular expressions?

2011-11-17 Thread Tero Paananen
 @Query(value = start n = node(1) match (n)-[:KNOWS]-(c) where c.a =~
 /.*?{foo}.*?/ return c, type = QueryType.Cypher)
 PageFoo getConnectedNodes(@Param(foo) String foo, Pageable pageable);


 The regular expression can either be provided through a parameter, or
 inlined in the query. You are trying to do a mix of both, and that is not
 supported in Cypher.  You have to create the full regular expression, and
 then pass that on to your query.

 You query would have to look like this:
 start n = node(1)
 match (n)-[:KNOWS]-(c)
 where c.a =~ {foo}
 return c

 Does that make sense?

It does.

However, still something not quite right about this.

Here's my query:

@Query(value = start user = node({nodeId}) match (user)-[:KNOWS]-(c)
where c.firstName =~ {keyword} return c, type = QueryType.Cypher)
PageUser searchUsers(@Param(nodeId) Long nodeId, @Param(keyword)
String keyword, Pageable pageable);

Calls to this query result in an exeption:

org.neo4j.cypher.ParameterNotFoundException: Expected a parameter named keyword
at 
org.neo4j.cypher.commands.ParameterValue$$anonfun$apply$2.apply(Value.scala:136)
at 
org.neo4j.cypher.commands.ParameterValue$$anonfun$apply$2.apply(Value.scala:136)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:122)
at scala.collection.immutable.Map$Map3.getOrElse(Map.scala:143)
at org.neo4j.cypher.commands.ParameterValue.apply(Value.scala:136)
at org.neo4j.cypher.commands.ParameterValue.apply(Value.scala:135)
at org.neo4j.cypher.commands.RegularExpression.isMatch(Clause.scala:101)
at 
org.neo4j.cypher.pipes.matching.PatternMatcher$$anonfun$isMatchSoFar$1.apply(PatternMatcher.scala:143)
at 
org.neo4j.cypher.pipes.matching.PatternMatcher$$anonfun$isMatchSoFar$1.apply(PatternMatcher.scala:143)

I've stepped inside the Neo4J code in the debugger, and I know the keyword param
is populated with the right type of value (/.*?foo.*?/) before the
query gets executed.

If I change the query to:


@Query(value = start user = node({nodeId}) match (user)-[:KNOWS]-(c)
where c.firstName = {keyword} return c, type = QueryType.Cypher)
PageUser searchUsers(@Param(nodeId) Long nodeId, @Param(keyword)
String keyword, Pageable pageable);

And execute it with the same keyword param value, the query gets executed just
fine.

Any ideas?

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Cypher-Pickle?

2011-11-04 Thread Tero Paananen
My take on this is to not use a SQL-like query language.

The reason is that while it looks like SQL it is not SQL, and the subtle
differences would be more confusing than with using a query language
that doesn't share SQL-like constructs.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Cypher-Pickle?

2011-11-04 Thread Tero Paananen
 I'd say the strongest part of Cypher is the ascii art pattern where you
 clearly see what you're querying for, right there and then without having
 to parse it into a graph into your head. Removing that would reduce my
 interest in this language significantly.

I strongly agree with this. It's EASY to see the relationships and
their direction with the syntax right now.

(cani)-[:HAS]-(more)-[:CHEEZ]-(burger)

I glance that and instantly figure out what it's trying to say. The SQL-
like examples I've seen so far aren't coming even close, IMHO. And
as the query complexity increases, I think the advantage Cypher's
syntax has increases even more.

Additionally I don't find adding a join keyword to a query language that
queries a data store that has no joins better in any shape or form.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Tero Paananen
This is probably not news to anyone, but I might as well post about
it in case new users are wondering about performance between
index based lookups and lookups by node ids.

I have a test database of 750,000 nodes of type A.

The db also contains 90,000 nodes of types B and C, and roughly
4M relationships between A-B and A-C (so two different relationship
types). The size on disk is 4.7GB, of which the Lucene index takes
2.3GB or so.

Node of type A has three properties, one fulltext indexed ones and
an id type property indexed with type exact index (type of property
is a string). Let's call the property name as guid. The relationships and
other types of nodes also have indexed properties, which are all indexed
in their own indexes. There are about 14M properties in the db.

To test the performance I generate a list of all node IDs and guid property
values, and perform 400,000 lookups using random entries from those
lists, and record the execution time of the 400,000 lookups.

This is on a box with 8GB of RAM, and the performance runs are nowhere
near using all that memory.

I'm using SDN 2.0.0 M1 to access the data. The node id lookups are
done with the findOne(Long id) method in the CRUDRepository class
and the guid property lookups are done with the
findByPropertyValue(String indexName, String property, Object value)
method in the NamedIndexRepository class.

Using default settings for the graph db.

The node id lookups run in about 12,700ms

The index based guid property id lookups run in about 123,000ms.

So roughly a 10x performance difference.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Cypher where regular expressions - case insensitive regexs?

2011-11-03 Thread Tero Paananen
re: 
http://docs.neo4j.org/chunked/snapshot/query-where.html#where-regular-expressions

Is there a way to use case insensitive regular expressions in Cypher where
clauses?

node.property =~ /foo.*/i

does not appear to work.

This would be a great addition to the language, if it's currently unsupported.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Tero Paananen
 Indexes, while fast they are still an indirection and way slower than a
 direct access of something. So this is quite expected.

Agreed. I wanted to run the performance tests to find out how much
slower the index lookups are. Would they have been 2x - 4x slower,
it would've probably still been acceptable for my particular use case.

At 10x, I need to think about alternatives.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Tero Paananen
 So I'd love your input by profiling your use-case. It is probably something 
 in between.

 Can you use visualvm or yourkit or another profiler to figure out the hotspot 
 methods
 where the most time is spent for the index-lookup. I would also love to pair 
 with you on this.
 (Or get your data-generator and use-cases and profile it myself).

 Could you please try to run the same test with the raw neo4j API to see the 
 difference.

Sure thing. I'll try and do this tomorrow, but it might have to wait
until next week.

 Another note:

 SDN was never intended to be a tool for pulling mass data from the graph into 
 memory as it
 adds some overhead for management and object creation. The most important 
 use-cases
 for SDN is that it gives you an easy way to work with the graph in terms of 
 your different
 domain models and also eases the integration of other libraries (e.g. mvc) 
 that require
 domain POJOS.
 For mass-data handling it might be sensible to drop down to the core neo4j 
 API (by getting
 the index used explicitely and pull the nodes with index.get(key,value)).

Absolutely, and this is exactly why I'm using SDN. The performance tests I ran
are by no means supposed to simulate normal uses. The repetitions were there
to make sure I could get meaningful run times for the micro benchmark.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] SDN w/ write and readonly Graph Database Service?

2011-11-01 Thread Tero Paananen
I'm using SDN to build a graph db I'm expecting to have
quite a heavy volume of write and read activity at peak
times. We're not ready to start using HA at this point.

I was wondering, if using Neo4J using two instances of
the Graph Database Service, one EmbeddedGraphDatabase
(writes), and one  or more EmbeddedReadOnlyGraphDatabase (reads) would make any
difference?

Are there any benefits for splitting the db access
that way? Or would accessing the db through the
writable instance for all operations be pretty
much the same?

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] SDN w/ write and readonly Graph Database Service?

2011-11-01 Thread Tero Paananen
On Tue, Nov 1, 2011 at 12:26 PM, Rick Bullotta
rick.bullo...@thingworx.com wrote:
 Probably the opposite (if you could even do it).  You'd lose the LRU caching 
 across
 the boundary.

 Is the data being written the same as the data being read, or is there a 
 natural
 segmentation?  If so you could implement a crude form of sharding/partioning
 to avoid hot spots (concurrency related) during these periods.

It's largely going to be the same data. It's definitely the same type of data.

Basically there will be discreet sets of same type of data (kinda like
sub-graphs,
but they can be connected) inserted in a batch-like manner. Those discreet sets
of data will then immediately be consumed by processes that use it to calculate
additional data and store the results to various data stores, incl.
the same Neo4J
db. After the initial consumption, the data sets will be consumed on-demand, but
I'm not sure about how frequently at this point. It'll likely be
something like once
or twice a day or even less frequently, but I'll have to see how the
usage patterns
emerge after the solution goes live to be sure.

Thanks for the answer Rick. I'll do some quick-and-dirty testing later this week
to guide the decision making on this. I'll see if I can post what I find on this
thread afterwards.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Spring Data Graph 2.0.0 and Cypher queries using Repositories

2011-10-24 Thread Tero Paananen
I just upgraded to Spring Data Graph for Neo4J 2.0.0.M1 and it  
looks like certain things changed for the worse for my needs.

Just looking for clarification of whether these changes are
permanent or to be addressed before final release.

I'm using Cypher queries defined in Repository classes, e.g.:

@Repository
public interface CustomRepository extends GraphRepositoryUser {
   @Query(value = start u = node:user(name = '%foo') match 
(u)-[:KNOWS*1..%depth]-() return u, type = QueryType.Cypher)
   IterableUser getConnections(@Param(foo) String foo, @Param(depth) 
Integer depth);
}

This used to work just fine in 1.1.0.RELEASE.

In 2.0.0.M01 %foo should be {foo} and %depth produces a syntax
error regardless of whether I specify it with %depth or {depth}.
Same with skip and limit instructions:

.. return u skip %skip limit %limit  

used to work just fine, however

.. return u skip {skip} limit {limit}

no longer works.

I know I could probably replicate this behavior using the
Neo4JTemplate functionality, but I'm not sure that's actually
a better way of doing that, considering how convenient it is
to create queries with the @Query annotation.

Your thoughts? And what would be my best options for alternatives
at this point?

Thanks!

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user