Re: [Neo4j] Querying a full text index
Yaniv, there is an example of this in the docs, http://docs.neo4j.org/chunked/snapshot/rest-api-indexes.html#rest-api-find-node-by-query shoing the ~ operator in action. HTH! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Sep 7, 2011 at 9:09 PM, Yaniv Ben Yosef yani...@gmail.com wrote: Hi Axel, I've read the syntax, which is why I was surprised. There are wildcard options in the syntax, e.g.: test* and test? and even te*st. So I would expect that [director*] should return director and directory. [director], if I understand the syntax correctly, should return just director. But actually, it also returns director and directory in my code. This means that [director] is equivalent to [director*], which I find a bit strange. In your example - the query [director] also returns both director and directory. The only thing that works is [+director]. Thing is, I don't want to force my users to remember advanced syntax and append a + to each word. And I also don't want to start parsing queries. I imagine that the syntax in the Lucene documentation should work (i.e., [director] *should not* be equivalent to [director*]. It's either a bug somewhere, or I'm not configuring/using something correctly. Anyone has an idea? Thanks again, --- Yaniv On Wed, Sep 7, 2011 at 8:31 PM, Axel Morgner a...@morgner.de wrote: Hi Yaniv, didn't try your case, just read the code. If I remember correctly, it may help to expand your search term director john into a Lucene query, e.g. something like \director\ OR \john\. The complete Lucene query syntax see [1]. Greetings Axel [1] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html Am 07.09.2011 um 12:16 schrieb Yaniv Ben Yosef: Hi, This question may be Lucene related, but since I'm using it via Neo4J I'm asking here first. I'm using Neo4J 1.4 M06. I have a graph representing people, with a few properties about each person (e.g., their name and job title). Now I'd like to create a search form that will allow the user to enter either the person's first name, last name, title, or any combination. For example, the query [john director] should result with all the people whose name or title contain both john and director. To play with that, I created this little psvm: public class FullTextIndexTest { public static void main(String[] args) { GraphDatabaseService graphDb = GraphDatabaseServiceFactory.createGraphDatabase(target/var/db); Transaction t = graphDb.beginTx(); Node n1 = graphDb.createNode(); n1.setProperty(name, John Smith); n1.setProperty(title, Directory Manager); Node n2 = graphDb.createNode(); n2.setProperty(name, Johnny Malkovich); n2.setProperty(title, Director of RD); Node n3 = graphDb.createNode(); n3.setProperty(name, John Horovich); n3.setProperty(title, Sr. Director); IndexManager index = graphDb.index(); IndexNode fulltextPerson = index.forNodes(person-fulltext, MapUtil.stringMap(IndexManager.PROVIDER, lucene, type, fulltext)); fulltextPerson.add(n1, combined, n1.getProperty(name) + + n1.getProperty(title)); fulltextPerson.add(n2, combined, n2.getProperty(name) + + n2.getProperty(title)); fulltextPerson.add(n3, combined, n3.getProperty(name) + + n3.getProperty(title)); t.success(); t.finish(); // search in the fulltext index IndexHitsNode hits = fulltextPerson.query(combined, director john); System.out.printf(Found %d results:\n, hits.size()); for (Node node : hits) { System.out.println(node.getProperty(name) + , + node.getProperty(title)); } } } I expected this program to return 1 result: John Horovich, Sr. Director Instead, I'm getting 3: John Horovich, Sr. Director John Smith, Directory Manager Johnny Malkovich, Director of RD It seems that Lucene will accept terms that contain a query term (e.g, Directory and Johnny) even if I'm not using any wildcards in my query. How do I turn this behavior off? I'd like the results to contain only people whose name or title *contain* the word john, but not johnny. Thanks! --- Yaniv ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Querying a full text index
Hi Yaniv - Something to keep in mind... It's now easy to use an external full-text index such as Solr, ElasticSearch, or IndexTank (http://indextank.com/) for full-text search and then use Neo4j for rankings. For example, you could do the full-text query using Solr and have it return a list of element IDs. Then if you want to use Gremlin for ranking, you could pass in the list of element IDs to Gremlin as the starting point of the query, and do a local rank type algorithm (http://markorodriguez.com/2011/03/30/global-vs-local-graph-ranking/). To make this work, a few days ago Marko updated Gremlin so you can pass in multiple element IDs like this: g.v(1,2,3,4,5,6,7,8) See https://groups.google.com/d/topic/gremlin-users/JjOopbFDHMw/discussion - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Querying-a-full-text-index-tp3316241p3330648.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Querying a full text index
Remember that the default match is 0.5 e.g director~0.5 hence why it matches up to two letter differences e.g ditectof, directors etc Sent from my iPhone On 08/09/2011, at 5:10 AM, Yaniv Ben Yosef yani...@gmail.com wrote: Hi Axel, I've read the syntax, which is why I was surprised. There are wildcard options in the syntax, e.g.: test* and test? and even te*st. So I would expect that [director*] should return director and directory. [director], if I understand the syntax correctly, should return just director. But actually, it also returns director and directory in my code. This means that [director] is equivalent to [director*], which I find a bit strange. In your example - the query [director] also returns both director and directory. The only thing that works is [+director]. Thing is, I don't want to force my users to remember advanced syntax and append a + to each word. And I also don't want to start parsing queries. I imagine that the syntax in the Lucene documentation should work (i.e., [director] *should not* be equivalent to [director*]. It's either a bug somewhere, or I'm not configuring/using something correctly. Anyone has an idea? Thanks again, --- Yaniv On Wed, Sep 7, 2011 at 8:31 PM, Axel Morgner a...@morgner.de wrote: Hi Yaniv, didn't try your case, just read the code. If I remember correctly, it may help to expand your search term director john into a Lucene query, e.g. something like \director\ OR \john\. The complete Lucene query syntax see [1]. Greetings Axel [1] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html Am 07.09.2011 um 12:16 schrieb Yaniv Ben Yosef: Hi, This question may be Lucene related, but since I'm using it via Neo4J I'm asking here first. I'm using Neo4J 1.4 M06. I have a graph representing people, with a few properties about each person (e.g., their name and job title). Now I'd like to create a search form that will allow the user to enter either the person's first name, last name, title, or any combination. For example, the query [john director] should result with all the people whose name or title contain both john and director. To play with that, I created this little psvm: public class FullTextIndexTest { public static void main(String[] args) { GraphDatabaseService graphDb = GraphDatabaseServiceFactory.createGraphDatabase(target/var/db); Transaction t = graphDb.beginTx(); Node n1 = graphDb.createNode(); n1.setProperty(name, John Smith); n1.setProperty(title, Directory Manager); Node n2 = graphDb.createNode(); n2.setProperty(name, Johnny Malkovich); n2.setProperty(title, Director of RD); Node n3 = graphDb.createNode(); n3.setProperty(name, John Horovich); n3.setProperty(title, Sr. Director); IndexManager index = graphDb.index(); IndexNode fulltextPerson = index.forNodes(person-fulltext, MapUtil.stringMap(IndexManager.PROVIDER, lucene, type, fulltext)); fulltextPerson.add(n1, combined, n1.getProperty(name) + + n1.getProperty(title)); fulltextPerson.add(n2, combined, n2.getProperty(name) + + n2.getProperty(title)); fulltextPerson.add(n3, combined, n3.getProperty(name) + + n3.getProperty(title)); t.success(); t.finish(); // search in the fulltext index IndexHitsNode hits = fulltextPerson.query(combined, director john); System.out.printf(Found %d results:\n, hits.size()); for (Node node : hits) { System.out.println(node.getProperty(name) + , + node.getProperty(title)); } } } I expected this program to return 1 result: John Horovich, Sr. Director Instead, I'm getting 3: John Horovich, Sr. Director John Smith, Directory Manager Johnny Malkovich, Director of RD It seems that Lucene will accept terms that contain a query term (e.g, Directory and Johnny) even if I'm not using any wildcards in my query. How do I turn this behavior off? I'd like the results to contain only people whose name or title *contain* the word john, but not johnny. Thanks! --- Yaniv ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Querying a full text index
Hi, This question may be Lucene related, but since I'm using it via Neo4J I'm asking here first. I'm using Neo4J 1.4 M06. I have a graph representing people, with a few properties about each person (e.g., their name and job title). Now I'd like to create a search form that will allow the user to enter either the person's first name, last name, title, or any combination. For example, the query [john director] should result with all the people whose name or title contain both john and director. To play with that, I created this little psvm: public class FullTextIndexTest { public static void main(String[] args) { GraphDatabaseService graphDb = GraphDatabaseServiceFactory.createGraphDatabase(target/var/db); Transaction t = graphDb.beginTx(); Node n1 = graphDb.createNode(); n1.setProperty(name, John Smith); n1.setProperty(title, Directory Manager); Node n2 = graphDb.createNode(); n2.setProperty(name, Johnny Malkovich); n2.setProperty(title, Director of RD); Node n3 = graphDb.createNode(); n3.setProperty(name, John Horovich); n3.setProperty(title, Sr. Director); IndexManager index = graphDb.index(); IndexNode fulltextPerson = index.forNodes(person-fulltext, MapUtil.stringMap(IndexManager.PROVIDER, lucene, type, fulltext)); fulltextPerson.add(n1, combined, n1.getProperty(name) + + n1.getProperty(title)); fulltextPerson.add(n2, combined, n2.getProperty(name) + + n2.getProperty(title)); fulltextPerson.add(n3, combined, n3.getProperty(name) + + n3.getProperty(title)); t.success(); t.finish(); // search in the fulltext index IndexHitsNode hits = fulltextPerson.query(combined, director john); System.out.printf(Found %d results:\n, hits.size()); for (Node node : hits) { System.out.println(node.getProperty(name) + , + node.getProperty(title)); } } } I expected this program to return 1 result: John Horovich, Sr. Director Instead, I'm getting 3: John Horovich, Sr. Director John Smith, Directory Manager Johnny Malkovich, Director of RD It seems that Lucene will accept terms that contain a query term (e.g, Directory and Johnny) even if I'm not using any wildcards in my query. How do I turn this behavior off? I'd like the results to contain only people whose name or title *contain* the word john, but not johnny. Thanks! --- Yaniv ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Querying a full text index
While I don't know that it will change anything, any reason that you're using M06 and not 1.4.1? There have been quite a few important fixes. Also, the analyzer that is used to tokenize both the indexed content and the query have an effect on the query processing. In any case, I would update to 1.4.1 so that diagnosing the issues would be significantly easier. From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Yaniv Ben Yosef [yani...@gmail.com] Sent: Wednesday, September 07, 2011 6:16 AM To: Neo4j user discussions Subject: [Neo4j] Querying a full text index Hi, This question may be Lucene related, but since I'm using it via Neo4J I'm asking here first. I'm using Neo4J 1.4 M06. I have a graph representing people, with a few properties about each person (e.g., their name and job title). Now I'd like to create a search form that will allow the user to enter either the person's first name, last name, title, or any combination. For example, the query [john director] should result with all the people whose name or title contain both john and director. To play with that, I created this little psvm: public class FullTextIndexTest { public static void main(String[] args) { GraphDatabaseService graphDb = GraphDatabaseServiceFactory.createGraphDatabase(target/var/db); Transaction t = graphDb.beginTx(); Node n1 = graphDb.createNode(); n1.setProperty(name, John Smith); n1.setProperty(title, Directory Manager); Node n2 = graphDb.createNode(); n2.setProperty(name, Johnny Malkovich); n2.setProperty(title, Director of RD); Node n3 = graphDb.createNode(); n3.setProperty(name, John Horovich); n3.setProperty(title, Sr. Director); IndexManager index = graphDb.index(); IndexNode fulltextPerson = index.forNodes(person-fulltext, MapUtil.stringMap(IndexManager.PROVIDER, lucene, type, fulltext)); fulltextPerson.add(n1, combined, n1.getProperty(name) + + n1.getProperty(title)); fulltextPerson.add(n2, combined, n2.getProperty(name) + + n2.getProperty(title)); fulltextPerson.add(n3, combined, n3.getProperty(name) + + n3.getProperty(title)); t.success(); t.finish(); // search in the fulltext index IndexHitsNode hits = fulltextPerson.query(combined, director john); System.out.printf(Found %d results:\n, hits.size()); for (Node node : hits) { System.out.println(node.getProperty(name) + , + node.getProperty(title)); } } } I expected this program to return 1 result: John Horovich, Sr. Director Instead, I'm getting 3: John Horovich, Sr. Director John Smith, Directory Manager Johnny Malkovich, Director of RD It seems that Lucene will accept terms that contain a query term (e.g, Directory and Johnny) even if I'm not using any wildcards in my query. How do I turn this behavior off? I'd like the results to contain only people whose name or title *contain* the word john, but not johnny. Thanks! --- Yaniv ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Querying a full text index
Hi Rick, Thanks, I will try upgrading and see if it fixes the issue. I have a feeling that I'm missing something here though.. --- Yaniv On Wed, Sep 7, 2011 at 2:16 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: While I don't know that it will change anything, any reason that you're using M06 and not 1.4.1? There have been quite a few important fixes. Also, the analyzer that is used to tokenize both the indexed content and the query have an effect on the query processing. In any case, I would update to 1.4.1 so that diagnosing the issues would be significantly easier. From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Yaniv Ben Yosef [yani...@gmail.com] Sent: Wednesday, September 07, 2011 6:16 AM To: Neo4j user discussions Subject: [Neo4j] Querying a full text index Hi, This question may be Lucene related, but since I'm using it via Neo4J I'm asking here first. I'm using Neo4J 1.4 M06. I have a graph representing people, with a few properties about each person (e.g., their name and job title). Now I'd like to create a search form that will allow the user to enter either the person's first name, last name, title, or any combination. For example, the query [john director] should result with all the people whose name or title contain both john and director. To play with that, I created this little psvm: public class FullTextIndexTest { public static void main(String[] args) { GraphDatabaseService graphDb = GraphDatabaseServiceFactory.createGraphDatabase(target/var/db); Transaction t = graphDb.beginTx(); Node n1 = graphDb.createNode(); n1.setProperty(name, John Smith); n1.setProperty(title, Directory Manager); Node n2 = graphDb.createNode(); n2.setProperty(name, Johnny Malkovich); n2.setProperty(title, Director of RD); Node n3 = graphDb.createNode(); n3.setProperty(name, John Horovich); n3.setProperty(title, Sr. Director); IndexManager index = graphDb.index(); IndexNode fulltextPerson = index.forNodes(person-fulltext, MapUtil.stringMap(IndexManager.PROVIDER, lucene, type, fulltext)); fulltextPerson.add(n1, combined, n1.getProperty(name) + + n1.getProperty(title)); fulltextPerson.add(n2, combined, n2.getProperty(name) + + n2.getProperty(title)); fulltextPerson.add(n3, combined, n3.getProperty(name) + + n3.getProperty(title)); t.success(); t.finish(); // search in the fulltext index IndexHitsNode hits = fulltextPerson.query(combined, director john); System.out.printf(Found %d results:\n, hits.size()); for (Node node : hits) { System.out.println(node.getProperty(name) + , + node.getProperty(title)); } } } I expected this program to return 1 result: John Horovich, Sr. Director Instead, I'm getting 3: John Horovich, Sr. Director John Smith, Directory Manager Johnny Malkovich, Director of RD It seems that Lucene will accept terms that contain a query term (e.g, Directory and Johnny) even if I'm not using any wildcards in my query. How do I turn this behavior off? I'd like the results to contain only people whose name or title *contain* the word john, but not johnny. Thanks! --- Yaniv ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Querying a full text index
Hi Axel, I've read the syntax, which is why I was surprised. There are wildcard options in the syntax, e.g.: test* and test? and even te*st. So I would expect that [director*] should return director and directory. [director], if I understand the syntax correctly, should return just director. But actually, it also returns director and directory in my code. This means that [director] is equivalent to [director*], which I find a bit strange. In your example - the query [director] also returns both director and directory. The only thing that works is [+director]. Thing is, I don't want to force my users to remember advanced syntax and append a + to each word. And I also don't want to start parsing queries. I imagine that the syntax in the Lucene documentation should work (i.e., [director] *should not* be equivalent to [director*]. It's either a bug somewhere, or I'm not configuring/using something correctly. Anyone has an idea? Thanks again, --- Yaniv On Wed, Sep 7, 2011 at 8:31 PM, Axel Morgner a...@morgner.de wrote: Hi Yaniv, didn't try your case, just read the code. If I remember correctly, it may help to expand your search term director john into a Lucene query, e.g. something like \director\ OR \john\. The complete Lucene query syntax see [1]. Greetings Axel [1] http://lucene.apache.org/java/3_1_0/queryparsersyntax.html Am 07.09.2011 um 12:16 schrieb Yaniv Ben Yosef: Hi, This question may be Lucene related, but since I'm using it via Neo4J I'm asking here first. I'm using Neo4J 1.4 M06. I have a graph representing people, with a few properties about each person (e.g., their name and job title). Now I'd like to create a search form that will allow the user to enter either the person's first name, last name, title, or any combination. For example, the query [john director] should result with all the people whose name or title contain both john and director. To play with that, I created this little psvm: public class FullTextIndexTest { public static void main(String[] args) { GraphDatabaseService graphDb = GraphDatabaseServiceFactory.createGraphDatabase(target/var/db); Transaction t = graphDb.beginTx(); Node n1 = graphDb.createNode(); n1.setProperty(name, John Smith); n1.setProperty(title, Directory Manager); Node n2 = graphDb.createNode(); n2.setProperty(name, Johnny Malkovich); n2.setProperty(title, Director of RD); Node n3 = graphDb.createNode(); n3.setProperty(name, John Horovich); n3.setProperty(title, Sr. Director); IndexManager index = graphDb.index(); IndexNode fulltextPerson = index.forNodes(person-fulltext, MapUtil.stringMap(IndexManager.PROVIDER, lucene, type, fulltext)); fulltextPerson.add(n1, combined, n1.getProperty(name) + + n1.getProperty(title)); fulltextPerson.add(n2, combined, n2.getProperty(name) + + n2.getProperty(title)); fulltextPerson.add(n3, combined, n3.getProperty(name) + + n3.getProperty(title)); t.success(); t.finish(); // search in the fulltext index IndexHitsNode hits = fulltextPerson.query(combined, director john); System.out.printf(Found %d results:\n, hits.size()); for (Node node : hits) { System.out.println(node.getProperty(name) + , + node.getProperty(title)); } } } I expected this program to return 1 result: John Horovich, Sr. Director Instead, I'm getting 3: John Horovich, Sr. Director John Smith, Directory Manager Johnny Malkovich, Director of RD It seems that Lucene will accept terms that contain a query term (e.g, Directory and Johnny) even if I'm not using any wildcards in my query. How do I turn this behavior off? I'd like the results to contain only people whose name or title *contain* the word john, but not johnny. Thanks! --- Yaniv ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user