Re: [Neo4j] Querying a full text index

2011-09-12 Thread Peter Neubauer
Yaniv,
there is an example of this in the docs,
http://docs.neo4j.org/chunked/snapshot/rest-api-indexes.html#rest-api-find-node-by-query
shoing the ~ operator in action.

HTH!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Wed, Sep 7, 2011 at 9:09 PM, Yaniv Ben Yosef yani...@gmail.com wrote:
 Hi Axel,

 I've read the syntax, which is why I was surprised. There are wildcard
 options in the syntax, e.g.: test* and test? and even te*st.
 So I would expect that [director*] should return director and directory.
 [director], if I understand the syntax correctly, should return just
 director.
 But actually, it also returns director and directory in my code.
 This means that [director] is equivalent to [director*], which I find a bit
 strange.

 In your example - the query [director] also returns both director and
 directory.
 The only thing that works is [+director].

 Thing is, I don't want to force my users to remember advanced syntax and
 append a + to each word. And I also don't want to start parsing queries.
 I imagine that the syntax in the Lucene documentation should work (i.e.,
 [director] *should not* be equivalent to [director*]. It's either a bug
 somewhere, or I'm not configuring/using something correctly.
 Anyone has an idea?

 Thanks again,

 --- Yaniv



 On Wed, Sep 7, 2011 at 8:31 PM, Axel Morgner a...@morgner.de wrote:

 Hi Yaniv,

 didn't try your case, just read the code. If I remember correctly, it may
 help to expand your search term director john into a Lucene query, e.g.
 something like \director\ OR \john\.

 The complete Lucene query syntax see [1].

 Greetings

 Axel

 [1] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html

 Am 07.09.2011 um 12:16 schrieb Yaniv Ben Yosef:

  Hi,
 
  This question may be Lucene related, but since I'm using it via Neo4J I'm
  asking here first. I'm using Neo4J 1.4 M06.
  I have a graph representing people, with a few properties about each
 person
  (e.g., their name and job title).
  Now I'd like to create a search form that will allow the user to enter
  either the person's first name, last name, title, or any combination. For
  example, the query [john director] should result with all the people
 whose
  name or title contain both john and director.
  To play with that, I created this little psvm:
 
  public class FullTextIndexTest
  {
     public static void main(String[] args)
     {
         GraphDatabaseService graphDb =
  GraphDatabaseServiceFactory.createGraphDatabase(target/var/db);
 
         Transaction t = graphDb.beginTx();
         Node n1 = graphDb.createNode();
         n1.setProperty(name, John Smith);
         n1.setProperty(title, Directory Manager);
 
         Node n2 = graphDb.createNode();
         n2.setProperty(name, Johnny Malkovich);
         n2.setProperty(title, Director of RD);
 
         Node n3 = graphDb.createNode();
         n3.setProperty(name, John Horovich);
         n3.setProperty(title, Sr. Director);
 
         IndexManager index = graphDb.index();
         IndexNode fulltextPerson = index.forNodes(person-fulltext,
                 MapUtil.stringMap(IndexManager.PROVIDER, lucene, type,
  fulltext));
         fulltextPerson.add(n1, combined, n1.getProperty(name) +   +
  n1.getProperty(title));
         fulltextPerson.add(n2, combined, n2.getProperty(name) +   +
  n2.getProperty(title));
         fulltextPerson.add(n3, combined, n3.getProperty(name) +   +
  n3.getProperty(title));
         t.success();
         t.finish();
 
         // search in the fulltext index
         IndexHitsNode hits = fulltextPerson.query(combined, director
  john);
         System.out.printf(Found %d results:\n, hits.size());
         for (Node node : hits)
         {
             System.out.println(node.getProperty(name) + ,  +
  node.getProperty(title));
         }
     }
  }
 
 
  I expected this program to return 1 result: John Horovich, Sr. Director
  Instead, I'm getting 3:
 
  John Horovich, Sr. Director
  John Smith, Directory Manager
  Johnny Malkovich, Director of RD
 
  It seems that Lucene will accept terms that contain a query term (e.g,
  Directory and Johnny) even if I'm not using any wildcards in my query.
 How
  do I turn this behavior off? I'd like the results to contain only people
  whose name or title *contain* the word john, but not johnny.
 
  Thanks!
  --- Yaniv
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user


 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 

Re: [Neo4j] Querying a full text index

2011-09-12 Thread espeed
Hi Yaniv -

Something to keep in mind...

It's now easy to use an external full-text index such as Solr,
ElasticSearch, or IndexTank
(http://indextank.com/) for full-text search and then use Neo4j for
rankings.

For example, you could do the full-text query using Solr and have it
return a list of element IDs. Then if you want to use Gremlin for
ranking, you could pass in the list of element IDs to Gremlin as the
starting point of the query, and do a local rank type algorithm
(http://markorodriguez.com/2011/03/30/global-vs-local-graph-ranking/).

To make this work, a few days ago Marko updated Gremlin so you can pass in
multiple element IDs like this:

  g.v(1,2,3,4,5,6,7,8)

See https://groups.google.com/d/topic/gremlin-users/JjOopbFDHMw/discussion

- James

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Querying-a-full-text-index-tp3316241p3330648.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Querying a full text index

2011-09-08 Thread Romiko Derbynew
Remember that the default match is 0.5 e.g director~0.5 hence why it matches up 
to two letter differences e.g ditectof, directors etc

Sent from my iPhone

On 08/09/2011, at 5:10 AM, Yaniv Ben Yosef yani...@gmail.com wrote:

 Hi Axel,
 
 I've read the syntax, which is why I was surprised. There are wildcard
 options in the syntax, e.g.: test* and test? and even te*st.
 So I would expect that [director*] should return director and directory.
 [director], if I understand the syntax correctly, should return just
 director.
 But actually, it also returns director and directory in my code.
 This means that [director] is equivalent to [director*], which I find a bit
 strange.
 
 In your example - the query [director] also returns both director and
 directory.
 The only thing that works is [+director].
 
 Thing is, I don't want to force my users to remember advanced syntax and
 append a + to each word. And I also don't want to start parsing queries.
 I imagine that the syntax in the Lucene documentation should work (i.e.,
 [director] *should not* be equivalent to [director*]. It's either a bug
 somewhere, or I'm not configuring/using something correctly.
 Anyone has an idea?
 
 Thanks again,
 
 --- Yaniv
 
 
 
 On Wed, Sep 7, 2011 at 8:31 PM, Axel Morgner a...@morgner.de wrote:
 
 Hi Yaniv,
 
 didn't try your case, just read the code. If I remember correctly, it may
 help to expand your search term director john into a Lucene query, e.g.
 something like \director\ OR \john\.
 
 The complete Lucene query syntax see [1].
 
 Greetings
 
 Axel
 
 [1] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html
 
 Am 07.09.2011 um 12:16 schrieb Yaniv Ben Yosef:
 
 Hi,
 
 This question may be Lucene related, but since I'm using it via Neo4J I'm
 asking here first. I'm using Neo4J 1.4 M06.
 I have a graph representing people, with a few properties about each
 person
 (e.g., their name and job title).
 Now I'd like to create a search form that will allow the user to enter
 either the person's first name, last name, title, or any combination. For
 example, the query [john director] should result with all the people
 whose
 name or title contain both john and director.
 To play with that, I created this little psvm:
 
 public class FullTextIndexTest
 {
   public static void main(String[] args)
   {
   GraphDatabaseService graphDb =
 GraphDatabaseServiceFactory.createGraphDatabase(target/var/db);
 
   Transaction t = graphDb.beginTx();
   Node n1 = graphDb.createNode();
   n1.setProperty(name, John Smith);
   n1.setProperty(title, Directory Manager);
 
   Node n2 = graphDb.createNode();
   n2.setProperty(name, Johnny Malkovich);
   n2.setProperty(title, Director of RD);
 
   Node n3 = graphDb.createNode();
   n3.setProperty(name, John Horovich);
   n3.setProperty(title, Sr. Director);
 
   IndexManager index = graphDb.index();
   IndexNode fulltextPerson = index.forNodes(person-fulltext,
   MapUtil.stringMap(IndexManager.PROVIDER, lucene, type,
 fulltext));
   fulltextPerson.add(n1, combined, n1.getProperty(name) +   +
 n1.getProperty(title));
   fulltextPerson.add(n2, combined, n2.getProperty(name) +   +
 n2.getProperty(title));
   fulltextPerson.add(n3, combined, n3.getProperty(name) +   +
 n3.getProperty(title));
   t.success();
   t.finish();
 
   // search in the fulltext index
   IndexHitsNode hits = fulltextPerson.query(combined, director
 john);
   System.out.printf(Found %d results:\n, hits.size());
   for (Node node : hits)
   {
   System.out.println(node.getProperty(name) + ,  +
 node.getProperty(title));
   }
   }
 }
 
 
 I expected this program to return 1 result: John Horovich, Sr. Director
 Instead, I'm getting 3:
 
 John Horovich, Sr. Director
 John Smith, Directory Manager
 Johnny Malkovich, Director of RD
 
 It seems that Lucene will accept terms that contain a query term (e.g,
 Directory and Johnny) even if I'm not using any wildcards in my query.
 How
 do I turn this behavior off? I'd like the results to contain only people
 whose name or title *contain* the word john, but not johnny.
 
 Thanks!
 --- Yaniv
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Querying a full text index

2011-09-07 Thread Yaniv Ben Yosef
Hi,

This question may be Lucene related, but since I'm using it via Neo4J I'm
asking here first. I'm using Neo4J 1.4 M06.
I have a graph representing people, with a few properties about each person
(e.g., their name and job title).
Now I'd like to create a search form that will allow the user to enter
either the person's first name, last name, title, or any combination. For
example, the query [john director] should result with all the people whose
name or title contain both john and director.
To play with that, I created this little psvm:

public class FullTextIndexTest
{
public static void main(String[] args)
{
GraphDatabaseService graphDb =
GraphDatabaseServiceFactory.createGraphDatabase(target/var/db);

Transaction t = graphDb.beginTx();
Node n1 = graphDb.createNode();
n1.setProperty(name, John Smith);
n1.setProperty(title, Directory Manager);

Node n2 = graphDb.createNode();
n2.setProperty(name, Johnny Malkovich);
n2.setProperty(title, Director of RD);

Node n3 = graphDb.createNode();
n3.setProperty(name, John Horovich);
n3.setProperty(title, Sr. Director);

IndexManager index = graphDb.index();
IndexNode fulltextPerson = index.forNodes(person-fulltext,
MapUtil.stringMap(IndexManager.PROVIDER, lucene, type,
fulltext));
fulltextPerson.add(n1, combined, n1.getProperty(name) +   +
n1.getProperty(title));
fulltextPerson.add(n2, combined, n2.getProperty(name) +   +
n2.getProperty(title));
fulltextPerson.add(n3, combined, n3.getProperty(name) +   +
n3.getProperty(title));
t.success();
t.finish();

// search in the fulltext index
IndexHitsNode hits = fulltextPerson.query(combined, director
john);
System.out.printf(Found %d results:\n, hits.size());
for (Node node : hits)
{
System.out.println(node.getProperty(name) + ,  +
node.getProperty(title));
}
}
}


I expected this program to return 1 result: John Horovich, Sr. Director
Instead, I'm getting 3:

John Horovich, Sr. Director
John Smith, Directory Manager
Johnny Malkovich, Director of RD

It seems that Lucene will accept terms that contain a query term (e.g,
Directory and Johnny) even if I'm not using any wildcards in my query. How
do I turn this behavior off? I'd like the results to contain only people
whose name or title *contain* the word john, but not johnny.

Thanks!
--- Yaniv
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Querying a full text index

2011-09-07 Thread Rick Bullotta
While I don't know that it will change anything, any reason that you're using 
M06 and not 1.4.1?  There have been quite a few important fixes.  Also, the 
analyzer that is used to tokenize both the indexed content and the query have 
an effect on the query processing.  In any case, I would update to 1.4.1 so 
that diagnosing the issues would be significantly easier.


From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of 
Yaniv Ben Yosef [yani...@gmail.com]
Sent: Wednesday, September 07, 2011 6:16 AM
To: Neo4j user discussions
Subject: [Neo4j] Querying a full text index

Hi,

This question may be Lucene related, but since I'm using it via Neo4J I'm
asking here first. I'm using Neo4J 1.4 M06.
I have a graph representing people, with a few properties about each person
(e.g., their name and job title).
Now I'd like to create a search form that will allow the user to enter
either the person's first name, last name, title, or any combination. For
example, the query [john director] should result with all the people whose
name or title contain both john and director.
To play with that, I created this little psvm:

public class FullTextIndexTest
{
public static void main(String[] args)
{
GraphDatabaseService graphDb =
GraphDatabaseServiceFactory.createGraphDatabase(target/var/db);

Transaction t = graphDb.beginTx();
Node n1 = graphDb.createNode();
n1.setProperty(name, John Smith);
n1.setProperty(title, Directory Manager);

Node n2 = graphDb.createNode();
n2.setProperty(name, Johnny Malkovich);
n2.setProperty(title, Director of RD);

Node n3 = graphDb.createNode();
n3.setProperty(name, John Horovich);
n3.setProperty(title, Sr. Director);

IndexManager index = graphDb.index();
IndexNode fulltextPerson = index.forNodes(person-fulltext,
MapUtil.stringMap(IndexManager.PROVIDER, lucene, type,
fulltext));
fulltextPerson.add(n1, combined, n1.getProperty(name) +   +
n1.getProperty(title));
fulltextPerson.add(n2, combined, n2.getProperty(name) +   +
n2.getProperty(title));
fulltextPerson.add(n3, combined, n3.getProperty(name) +   +
n3.getProperty(title));
t.success();
t.finish();

// search in the fulltext index
IndexHitsNode hits = fulltextPerson.query(combined, director
john);
System.out.printf(Found %d results:\n, hits.size());
for (Node node : hits)
{
System.out.println(node.getProperty(name) + ,  +
node.getProperty(title));
}
}
}


I expected this program to return 1 result: John Horovich, Sr. Director
Instead, I'm getting 3:

John Horovich, Sr. Director
John Smith, Directory Manager
Johnny Malkovich, Director of RD

It seems that Lucene will accept terms that contain a query term (e.g,
Directory and Johnny) even if I'm not using any wildcards in my query. How
do I turn this behavior off? I'd like the results to contain only people
whose name or title *contain* the word john, but not johnny.

Thanks!
--- Yaniv
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Querying a full text index

2011-09-07 Thread Yaniv Ben Yosef
Hi Rick,

Thanks, I will try upgrading and see if it fixes the issue. I have a feeling
that I'm missing something here though..

--- Yaniv



On Wed, Sep 7, 2011 at 2:16 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 While I don't know that it will change anything, any reason that you're
 using M06 and not 1.4.1?  There have been quite a few important fixes.
  Also, the analyzer that is used to tokenize both the indexed content and
 the query have an effect on the query processing.  In any case, I would
 update to 1.4.1 so that diagnosing the issues would be significantly easier.

 
 From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
 Behalf Of Yaniv Ben Yosef [yani...@gmail.com]
 Sent: Wednesday, September 07, 2011 6:16 AM
 To: Neo4j user discussions
 Subject: [Neo4j] Querying a full text index

 Hi,

 This question may be Lucene related, but since I'm using it via Neo4J I'm
 asking here first. I'm using Neo4J 1.4 M06.
 I have a graph representing people, with a few properties about each person
 (e.g., their name and job title).
 Now I'd like to create a search form that will allow the user to enter
 either the person's first name, last name, title, or any combination. For
 example, the query [john director] should result with all the people whose
 name or title contain both john and director.
 To play with that, I created this little psvm:

 public class FullTextIndexTest
 {
public static void main(String[] args)
{
GraphDatabaseService graphDb =
 GraphDatabaseServiceFactory.createGraphDatabase(target/var/db);

Transaction t = graphDb.beginTx();
Node n1 = graphDb.createNode();
n1.setProperty(name, John Smith);
n1.setProperty(title, Directory Manager);

Node n2 = graphDb.createNode();
n2.setProperty(name, Johnny Malkovich);
n2.setProperty(title, Director of RD);

Node n3 = graphDb.createNode();
n3.setProperty(name, John Horovich);
n3.setProperty(title, Sr. Director);

IndexManager index = graphDb.index();
IndexNode fulltextPerson = index.forNodes(person-fulltext,
MapUtil.stringMap(IndexManager.PROVIDER, lucene, type,
 fulltext));
fulltextPerson.add(n1, combined, n1.getProperty(name) +   +
 n1.getProperty(title));
fulltextPerson.add(n2, combined, n2.getProperty(name) +   +
 n2.getProperty(title));
fulltextPerson.add(n3, combined, n3.getProperty(name) +   +
 n3.getProperty(title));
t.success();
t.finish();

// search in the fulltext index
IndexHitsNode hits = fulltextPerson.query(combined, director
 john);
System.out.printf(Found %d results:\n, hits.size());
for (Node node : hits)
{
System.out.println(node.getProperty(name) + ,  +
 node.getProperty(title));
}
}
 }


 I expected this program to return 1 result: John Horovich, Sr. Director
 Instead, I'm getting 3:

 John Horovich, Sr. Director
 John Smith, Directory Manager
 Johnny Malkovich, Director of RD

 It seems that Lucene will accept terms that contain a query term (e.g,
 Directory and Johnny) even if I'm not using any wildcards in my query. How
 do I turn this behavior off? I'd like the results to contain only people
 whose name or title *contain* the word john, but not johnny.

 Thanks!
 --- Yaniv
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Querying a full text index

2011-09-07 Thread Yaniv Ben Yosef
Hi Axel,

I've read the syntax, which is why I was surprised. There are wildcard
options in the syntax, e.g.: test* and test? and even te*st.
So I would expect that [director*] should return director and directory.
[director], if I understand the syntax correctly, should return just
director.
But actually, it also returns director and directory in my code.
This means that [director] is equivalent to [director*], which I find a bit
strange.

In your example - the query [director] also returns both director and
directory.
The only thing that works is [+director].

Thing is, I don't want to force my users to remember advanced syntax and
append a + to each word. And I also don't want to start parsing queries.
I imagine that the syntax in the Lucene documentation should work (i.e.,
[director] *should not* be equivalent to [director*]. It's either a bug
somewhere, or I'm not configuring/using something correctly.
Anyone has an idea?

Thanks again,

--- Yaniv



On Wed, Sep 7, 2011 at 8:31 PM, Axel Morgner a...@morgner.de wrote:

 Hi Yaniv,

 didn't try your case, just read the code. If I remember correctly, it may
 help to expand your search term director john into a Lucene query, e.g.
 something like \director\ OR \john\.

 The complete Lucene query syntax see [1].

 Greetings

 Axel

 [1] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html

 Am 07.09.2011 um 12:16 schrieb Yaniv Ben Yosef:

  Hi,
 
  This question may be Lucene related, but since I'm using it via Neo4J I'm
  asking here first. I'm using Neo4J 1.4 M06.
  I have a graph representing people, with a few properties about each
 person
  (e.g., their name and job title).
  Now I'd like to create a search form that will allow the user to enter
  either the person's first name, last name, title, or any combination. For
  example, the query [john director] should result with all the people
 whose
  name or title contain both john and director.
  To play with that, I created this little psvm:
 
  public class FullTextIndexTest
  {
 public static void main(String[] args)
 {
 GraphDatabaseService graphDb =
  GraphDatabaseServiceFactory.createGraphDatabase(target/var/db);
 
 Transaction t = graphDb.beginTx();
 Node n1 = graphDb.createNode();
 n1.setProperty(name, John Smith);
 n1.setProperty(title, Directory Manager);
 
 Node n2 = graphDb.createNode();
 n2.setProperty(name, Johnny Malkovich);
 n2.setProperty(title, Director of RD);
 
 Node n3 = graphDb.createNode();
 n3.setProperty(name, John Horovich);
 n3.setProperty(title, Sr. Director);
 
 IndexManager index = graphDb.index();
 IndexNode fulltextPerson = index.forNodes(person-fulltext,
 MapUtil.stringMap(IndexManager.PROVIDER, lucene, type,
  fulltext));
 fulltextPerson.add(n1, combined, n1.getProperty(name) +   +
  n1.getProperty(title));
 fulltextPerson.add(n2, combined, n2.getProperty(name) +   +
  n2.getProperty(title));
 fulltextPerson.add(n3, combined, n3.getProperty(name) +   +
  n3.getProperty(title));
 t.success();
 t.finish();
 
 // search in the fulltext index
 IndexHitsNode hits = fulltextPerson.query(combined, director
  john);
 System.out.printf(Found %d results:\n, hits.size());
 for (Node node : hits)
 {
 System.out.println(node.getProperty(name) + ,  +
  node.getProperty(title));
 }
 }
  }
 
 
  I expected this program to return 1 result: John Horovich, Sr. Director
  Instead, I'm getting 3:
 
  John Horovich, Sr. Director
  John Smith, Directory Manager
  Johnny Malkovich, Director of RD
 
  It seems that Lucene will accept terms that contain a query term (e.g,
  Directory and Johnny) even if I'm not using any wildcards in my query.
 How
  do I turn this behavior off? I'd like the results to contain only people
  whose name or title *contain* the word john, but not johnny.
 
  Thanks!
  --- Yaniv
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user


 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user