Re: [Neo4j] neo4j REST server configuration
Hi, The heap part was resolved. Thanks for adding it to the wiki. Any ideas as to why the JMX doesn't show the info when attached? ~Mohit - Original Message From: Mattias Persson matt...@neotechnology.com To: Neo4j user discussions user@lists.neo4j.org Sent: Tue, September 14, 2010 2:12:13 AM Subject: Re: [Neo4j] neo4j REST server configuration Is this resolved? Take a look at http://wiki.neo4j.org/content/Getting_Started_REST#Configure_amount_of_memoryotherwise 2010/8/7 Mohit Vazirani mohi...@yahoo.com Hi, I'm running the standalone neo4j REST server on a 64 bit linux machine with 64GB RAM and am trying to configure the following memory settings through the wrapper.conf file: wrapper.java.initmemory=16144 wrapper.java.maxmemory=16144 However when I restart the server, JMX shows me the following VM arguments: -Dcom.sun.management.jmxremote -Xms4096m -Xmx4096m -Djava.library.path=lib -Dwrapper.key=q8W6vP8LS9mj0ekz -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.pid=27943 -Dwrapper.version=3.2.3 -Dwrapper.native_library=wrapper -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 Another unrelated issue is that JMX Mbeans shows configuration attributes as unavailable when I attach to the REST wrapper. The reason I am looking into modifying the configuration is that my client servers seem to be timing out. The server cannot handle more than 5 concurrent transactions, so I want to tweak the heap size and see if that helps. Thanks, ~Mohit ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j REST server configuration
Mohit, are you connecting via JConsole to the running process to see the JMX data? Cheers, /peter neubauer VP Product Development, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Sep 15, 2010 at 11:03 AM, Mohit Vazirani mohi...@yahoo.com wrote: Hi, The heap part was resolved. Thanks for adding it to the wiki. Any ideas as to why the JMX doesn't show the info when attached? ~Mohit - Original Message From: Mattias Persson matt...@neotechnology.com To: Neo4j user discussions user@lists.neo4j.org Sent: Tue, September 14, 2010 2:12:13 AM Subject: Re: [Neo4j] neo4j REST server configuration Is this resolved? Take a look at http://wiki.neo4j.org/content/Getting_Started_REST#Configure_amount_of_memoryotherwise 2010/8/7 Mohit Vazirani mohi...@yahoo.com Hi, I'm running the standalone neo4j REST server on a 64 bit linux machine with 64GB RAM and am trying to configure the following memory settings through the wrapper.conf file: wrapper.java.initmemory=16144 wrapper.java.maxmemory=16144 However when I restart the server, JMX shows me the following VM arguments: -Dcom.sun.management.jmxremote -Xms4096m -Xmx4096m -Djava.library.path=lib -Dwrapper.key=q8W6vP8LS9mj0ekz -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.pid=27943 -Dwrapper.version=3.2.3 -Dwrapper.native_library=wrapper -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 Another unrelated issue is that JMX Mbeans shows configuration attributes as unavailable when I attach to the REST wrapper. The reason I am looking into modifying the configuration is that my client servers seem to be timing out. The server cannot handle more than 5 concurrent transactions, so I want to tweak the heap size and see if that helps. Thanks, ~Mohit ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Relationship Check During Traversal
Hi, My suggestion is to look at the BestFirstSelectorFactory abstract classhttp://components.neo4j.org/neo4j-graph-algo/apidocs/org/neo4j/graphalgo/util/BestFirstSelectorFactory.html. Extend that and fill in the methods. It worked wonders for me trying to traverse a weighted graph and worked much faster than dijkstra or A* factory methods as I could control traversal much more fine-grained with pruners and a filter. My $0.02 Regards, Morten Barklund On Wed, Sep 15, 2010 at 07:51, Paddy paddyf...@gmail.com wrote: Hi , I'm trying to setup a Traversal in a time dependant graph with multiple weighted connections between nodes representing minutes. I want to only traverse the first relationship with a value greater than the weight of the traversal's current position. i.e if the path.weight()=100 only traverse the first relationship with a departure time property =100 I have implemented a modified BranchSelector to identify the next relationship to traverse depending on the path weight. But once i have identified this relationship in BranchSelector.next(), how can I return a new TraversalBranch using this node. As i see the TraversalBranch is created in ExpansionSourceImpl using: TraversalBranch next = new ExpansionSourceImpl( traverser, this, depth + 1, node,traverser.description.expander, relationship ); Would I need to create a modified ExpansionSourceImpl? Please let me know if i am going down the right path :-/ thanks Paddy On Sat, Sep 11, 2010 at 1:51 PM, Paddy paddyf...@gmail.com wrote: Hi David, Thanks for your help, I got it working using the following code. I tested it on a small graph with the neo4j java-dijkstra example and it works :) Cheers Paddy public static PruneEvaluator pruneAfterTransfer() { return new PruneEvaluator() { public boolean pruneAfter( Path path ) { System.out.println(path + path); int count=0; if(path.lastRelationship().isType(RelationshipTypes.TRANSFER)) { IterableRelationship relationships = path.relationships(); for(Relationship relationship : relationships) { if(relationship.isType(RelationshipTypes.TRANSFER)) { if (++count == 2) { System.out.println(Breaking!!); return true; } } } } return false; } }; } PruneEvaluator prunerAfterTransfer = pruneAfterTransfer(); private static final TraversalDescription TRAVERSAL = Traversal.description().uniqueness( Uniqueness.NONE ).prune(pruneAfterTransfer()); On Fri, Sep 10, 2010 at 10:11 PM, David Montag david.mon...@neotechnology.com wrote: Hi Paddy, One idea is to prune the traversal by looking at whether the path so far already has a transfer relationship or not. You would then do some kind of filtering of the resulting paths, e.g. only accepting those with correct end nodes. I don't know if the computational complexity of this is acceptable or not though. And I don't know if this answer was relevant or not. I hope it was :) David On Sat, Sep 11, 2010 at 4:09 AM, Paddy paddyf...@gmail.com wrote: Hi just a quick question regarding the use of the PruneEvaluator I was wondering what would be the best way to modify the TraversalDescription in the Dijkstra algorithm in order to prune a traversal when a branch has reached a second transfer relationship. I want to avoid multiple transfers in a bus network. If the graph is arranged as: (stop:1) --bus (stop:2) --transfer (stop:3) --bus (stop:4) --transfer (stop:5) Is it possible to prune the traversal branch when the 2nd transfer relationship is reached after (stop:4) Could this be achieved using a PruneEvaluator? Or am I approaching this the wrong way? thanks Paddy ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Morten Barklund ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Possible functional gap in Lucene indexing?
Hi, all. We're trying to use Lucene for fulltext indexing of some textual content that is stored in Neo, and we've hit a bit of a roadblock. In some cases, that content will be updated/edited and/or nodes will be removed, but the process by which index information is removed seems awkward. In particular, it would seem that a removeIndex(Node node) method would be extremely helpful for removing all indexes on a particular node. The current method requires retrieving and passing in the original textual content so that the node can be de-indexed. Is there any solution that would allow index removal given only a Node? Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Possible_functional_gap_in_Lucene_ind exing?
Doh! Seems like we just overlooked the method signature with removeIndex(Node,key), which will do exactly what we want. Have to lay off the Duff for a while... Original Message Subject: [Neo4j] Possible_functional_gap_in_Lucene_indexing? From: [1]rick.bullo...@burningskysoftware.com Date: Wed, September 15, 2010 8:03 am To: [2]u...@lists.neo4j.org Hi, all. We're trying to use Lucene for fulltext indexing of some textual content that is stored in Neo, and we've hit a bit of a roadblock. In some cases, that content will be updated/edited and/or nodes will be removed, but the process by which index information is removed seems awkward. In particular, it would seem that a removeIndex(Node node) method would be extremely helpful for removing all indexes on a particular node. The current method requires retrieving and passing in the original textual content so that the node can be de-indexed. Is there any solution that would allow index removal given only a Node? Thanks, Rick ___ Neo4j mailing list [3]u...@lists.neo4j.org [4]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:rick.bullo...@burningskysoftware.com 2. mailto:user@lists.neo4j.org 3. mailto:User@lists.neo4j.org 4. https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] IndexProvider question
I just added a way to do this (not as a persistent config, since they control write behaviour), but instead as an addition to QueryContext. So you can do like this: myNodeIndex.query( new QueryContext( name:Mattias occupation:developer ).defaultOperator( Operator.AND ) ); I know it's a bit verbose, but it's a start at least. Grab the latest version and try it out to see if it works for you. 2010/9/10 Mattias Persson matt...@neotechnology.com 2010/9/10, Honnur Vorvoi vhon...@yahoo.com: I would like to set AND as the default operator when I create index using the new index library: Index = indexProvider.nodeIndex( fulltext, LuceneIndexProvider.FULLTEXT_CONFIG ); I didn't find setDefaultOperator (similar to the one in LuceneFulltextQueryIndexService )in any of the provider classes. Is it supported in the new index provider? if not, is there a way we can set the same? Thanks in advance. That functionality is easy to add, I just haven't gotten around to do it. I'll try to add that as soon as possible. Excellent feedback on the new IndexProvider framework, keep it coming! --- On Thu, 9/9/10, Honnur Vorvoi vhon...@yahoo.com wrote: From: Honnur Vorvoi vhon...@yahoo.com Subject: Re: [Neo4j] IndexProvider question To: user@lists.neo4j.org Date: Thursday, September 9, 2010, 10:33 PM Thanks Mattias. Since IndexProvider does all LuceneFulltextQueryIndexService can do and much more, I am going to use just IndexProvider. Date: Wed, 8 Sep 2010 16:28:56 +0200 From: Mattias Persson matt...@neotechnology.com Subject: Re: [Neo4j] IndexProvider question To: Neo4j user discussions user@lists.neo4j.org Message-ID: aanlktin4cjw=smw00=1nlkt8ftmys6xtnvtrve_j9...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 Hi Honnur! 2010/9/6, Honnur Vorvoi vhon...@yahoo.com: Hello, I have the following questions with regard to the IndexProvider(example below): 1. I already have LuceneFulltextQueryIndexService. Can I use IndexProvider with the same graphDb as well? or are they mutually exclusive? They are separate from one another so both can be used alongside of each other. Something stored in one of either LuceneIndexService/LuceneIndexProvider won't affect the other. 2. What doesn the param users in provider.nodeIndex(users) represent? The LuceneIndexService can only keep values from one key in each index, but the new LuceneIndexProvider can spawn indexes which can contain any number of keys and values (making compound queries possible). Since an index isn't tied to a property key you must name each index yourself. Each index can also be configured to be either fulltext or not, to use lower case conversion or not, a.s.o. 3. Do I need to add all the properties in IndexNode(line# 45) in order to query? (I have already index the same properties with LuceneFulltextQueryIndexService) see my answer for (1), in short: LuceneIndexProvider and the indexes it spawns has nothing to do with LuceneIndexService (or any derivative thereof) and hence can't share state. 4. Is it easy to include the query(String) method in LuceneFulltextQueryIndexService, so I can use just one indexservice otherwise I would be using LuceneIndexProvider just for query(String) method. To add compound querying the storage format (i.e. Lucene usage) needed to change in incompatible ways, so it isn't an easy fix to add that. It could however be done by querying multiple indexes in parallell and merge the results afterwards, but I don't think performance would be anywhere near using Lucene the right way for compound queries, as LuceneIndexProvider does. As alwasy, appreciate your suggestions/recommendations 1 IndexProvider provider = new LuceneIndexProvider( graphDb ); 2 IndexNode myIndex = provider.nodeIndex( users ); 3 4 myIndex.add( myNode, type, value1 ); 5 myIndex.add( myNode, key1, value2 ); 6 7 // Ask lucene queries directly here 8 for ( Node searchHit : myIndex.query( type:value1 AND key1:value2 ) ) 9 { 10 System.out.println( Found + searchHit ); 11 } ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Possible_functional_gap_in_Lucene_indexing?
2010/9/15 rick.bullo...@burningskysoftware.com Doh! Seems like we just overlooked the method signature with removeIndex(Node,key), which will do exactly what we want. Excellent! Have to lay off the Duff for a while... Original Message Subject: [Neo4j] Possible_functional_gap_in_Lucene_indexing? From: [1]rick.bullo...@burningskysoftware.com Date: Wed, September 15, 2010 8:03 am To: [2]u...@lists.neo4j.org Hi, all. We're trying to use Lucene for fulltext indexing of some textual content that is stored in Neo, and we've hit a bit of a roadblock. In some cases, that content will be updated/edited and/or nodes will be removed, but the process by which index information is removed seems awkward. In particular, it would seem that a removeIndex(Node node) method would be extremely helpful for removing all indexes on a particular node. The current method requires retrieving and passing in the original textual content so that the node can be de-indexed. Is there any solution that would allow index removal given only a Node? Thanks, Rick ___ Neo4j mailing list [3]u...@lists.neo4j.org [4]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:rick.bullo...@burningskysoftware.com 2. mailto:user@lists.neo4j.org 3. mailto:User@lists.neo4j.org 4. https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4]u...@lists.neo4j.org [5]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6]matt...@neotechnology.com] Hacker, Neo Technology [7]www.neotechnology.com ___ Neo4j mailing list [8]u...@lists.neo4j.org [9]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:matt...@neotechnology.com 2. mailto:user@lists.neo4j.org 3. mailto:rick.bullo...@burningskysoftware.com 4. mailto:User@lists.neo4j.org 5. https://lists.neo4j.org/mailman/listinfo/user 6. mailto:matt...@neotechnology.com 7. http://www.neotechnology.com/ 8. mailto:User@lists.neo4j.org 9. https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Using the REST neo4j
Hi Alex, sorry this response took me so long, see responses inline! 2010/9/10 Alexandru Popescu ☀ the.mindstorm.mailingl...@gmail.com On Tuesday, September 7, 2010, Jacob Hansson ja...@voltvoodoo.com wrote: 2010/9/6 Alexandru Popescu ☀ the.mindstorm.mailingl...@gmail.com On Monday, September 6, 2010, Jim Webber j...@webber.name wrote: Hi Alex, While I still can achieve all these with the current packaging, it feels more hacky: I need to create a new Jetty6BasedWebServer or modify the existing one to enhance it with my own stuff. Each change would require compiling and repackaging the whole neo4j-rest. Definitely not as easy as dropping in my own jar and a new web.xml. That's an interesting point. In a sense, the neo-rest package is Neo's REST package. Interesting... My main question is: what exactly is this package offering to the end user in the current form? IMO it cannot be an off-the-shelf product as there is no security. It is not a library either, as extending it is not so easy. Basically, and without any intention to harm any feelings, it looks like one of those dummy web UI interface to X. And I'd say it has much more potential than that! Jacob, I must confess I'm totally confused by your comments below. I've always seen it as the beginnings of a proper stand-alone neo4j server. If it is the beginning, then what comes next? And more importantly from whom? Basically my proposal was meant to make things easier for people to built on top of it, so I'm not really sure how you see the continuation of it. There is lots of cool things that can be done next. Both continuing to extend the functionality of the REST server, but also (and more importantly) to look at and help out with the work being done on clients for the server in various languages. For instance I'd love to see simple-to-use ORMs on top of the python and php clients, enabling web developers to start building stuff with Neo4j as their database tier. A REST/JSON API to Neo4j opens up for remote clients in any language, and would be an important part in matching offerings from other database vendors. While extendability is a great thing, building it as a library and/or packaging it as a WAR makes it very java-centric. Currently the neo4j-rest is distributed as a java application. So it is java-centric. What makes it attractive is that it allows using the HTTP protocol. Providing neo4j-rest as both a self contained app and as a web app will give you exactly the same benefits, with additional freedom on choosing how to use it, what servers to deploy it too, etc. True, the neo4j-rest project is, but I see the biggest potential of the REST project in it's stand-alone version, neo4j-rest-standalone. The reason for that is precisely what I mentioned before, the fact that it is not java centric. Distributing both as a WAR and as a stand-alone application sounds like a great solution! Like you say, there is no security, and I agree it is currently the main culprit stopping neo4j REST from production use. This can of course be offset with firewalling etc, but I couldn't agree more of the importance of a proper security layer. Security was used as a basic example of things that could be much easier to be added on top of the neo4j-rest if provided in a simpler format. As you probably know already firewalls will give you at most a very basic sort of authentication, but nothing else. As far as UI interface to X goes, the area to focus on I think is the JSON part of the API. With that, a UI can be built in any language. Take a look at http://github.com/neo4j/webadmin for a more powerful browsing UI for neo4j REST. I think you mis-read my post. I'm not looking for a nice UI, but rather for a basis to further build REST services on top of a neo4j db. As Jim mentioned in his posts, currently neo4j-rest is just exposing the basics of a neo4j db. It's true that the functionality exposed by neo4j-rest so far is fairly basic. There are several important parts of the neo4j API that should be exposed via REST (like transactions). I don't see that as an argument to make neo4j-rest more extendable though, as I feel these core items should be added the same way the data browsing, index and traversal APIs have been added. That said - extendability would be a great thing, and there are ways to make neo4j-rest much more accessible than it is today. I know Andreas is looking into the possibility of making neo4j-rest use OSGi-magic, which if implemented would make it possible to hot-deploy extensions into neo4j-rest as well as package extensions with it. I think the main reason of our disagreement (and my confusing answers :) ) is that we view neo4j-rest from two sides. I see it through the eyes of a web-developer. I'm used to having my database at some given port and a client in my web tier that throws work at the database. You see it as a
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4]u...@lists.neo4j.org [5]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6]matt...@neotechnology.com] Hacker, Neo Technology [7]www.neotechnology.com ___ Neo4j mailing list [8]u...@lists.neo4j.org [9]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:matt...@neotechnology.com 2. mailto:user@lists.neo4j.org 3. mailto:rick.bullo...@burningskysoftware.com 4. mailto:User@lists.neo4j.org 5. https://lists.neo4j.org/mailman/listinfo/user 6. mailto:matt...@neotechnology.com 7. http://www.neotechnology.com/ 8. mailto:User@lists.neo4j.org 9. https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 [3]rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3][6]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4][7]u...@lists.neo4j.org [5][8]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6][9]matt...@neotechnology.com] Hacker, Neo Technology [7][10]www.neotechnology.com ___ Neo4j mailing list [8][11]u...@lists.neo4j.org [9][12]https://lists.neo4j.org/mailman/listinfo/user References 1. [13]mailto:matt...@neotechnology.com 2. [14]mailto:user@lists.neo4j.org 3. [15]mailto:rick.bullo...@burningskysoftware.com 4. [16]mailto:User@lists.neo4j.org 5. [17]https://lists.neo4j.org/mailman/listinfo/user 6. [18]mailto:matt...@neotechnology.com 7. [19]http://www.neotechnology.com/ 8. [20]mailto:User@lists.neo4j.org 9. [21]https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list [22]u...@lists.neo4j.org [23]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[24]matt...@neotechnology.com] Hacker, Neo Technology [25]www.neotechnology.com ___ Neo4j mailing list [26]u...@lists.neo4j.org [27]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:matt...@neotechnology.com 2. mailto:user@lists.neo4j.org 3. mailto:rick.bullo...@burningskysoftware.com 4. mailto:matt...@neotechnology.com 5. mailto:user@lists.neo4j.org 6. mailto:rick.bullo...@burningskysoftware.com 7. mailto:User@lists.neo4j.org 8. https://lists.neo4j.org/mailman/listinfo/user 9. mailto:matt...@neotechnology.com 10. http://www.neotechnology.com/ 11. mailto:User@lists.neo4j.org 12. https://lists.neo4j.org/mailman/listinfo/user 13. mailto:matt...@neotechnology.com 14. mailto:user@lists.neo4j.org 15. mailto:rick.bullo...@burningskysoftware.com 16. mailto:User@lists.neo4j.org 17. https://lists.neo4j.org/mailman/listinfo/user 18. mailto:matt...@neotechnology.com 19. http://www.neotechnology.com/ 20. mailto:User@lists.neo4j.org 21. https://lists.neo4j.org/mailman/listinfo/user 22. mailto:User@lists.neo4j.org 23. https://lists.neo4j.org/mailman/listinfo/user 24. mailto:matt...@neotechnology.com 25. http://www.neotechnology.com/ 26. mailto:User@lists.neo4j.org 27. https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Hi I might be overly simplistic here, but why not lowercase the text, remove html markup, then remove all non-word-or-space-characters, store this as the stripped version of the text on the node (for de-indexing) and index this? /Barklund On Wed, Sep 15, 2010 at 18:07, rick.bullo...@burningskysoftware.com wrote: Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 [3]rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3][6]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4][7]u...@lists.neo4j.org [5][8]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6][9]matt...@neotechnology.com] Hacker, Neo Technology [7][10]www.neotechnology.com ___ Neo4j mailing list [8][11]u...@lists.neo4j.org [9][12]https://lists.neo4j.org/mailman/listinfo/user References 1. [13]mailto:matt...@neotechnology.com 2. [14]mailto:user@lists.neo4j.org 3. [15]mailto:rick.bullo...@burningskysoftware.com 4. [16]mailto:User@lists.neo4j.org 5. [17]https://lists.neo4j.org/mailman/listinfo/user 6. [18]mailto:matt...@neotechnology.com 7. [19]http://www.neotechnology.com/ 8. [20]mailto:User@lists.neo4j.org 9. [21]https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list [22]u...@lists.neo4j.org [23]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[24]matt...@neotechnology.com] Hacker, Neo Technology [25]www.neotechnology.com ___ Neo4j mailing list [26]u...@lists.neo4j.org [27]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:matt...@neotechnology.com 2. mailto:user@lists.neo4j.org 3. mailto:rick.bullo...@burningskysoftware.com 4. mailto:matt...@neotechnology.com 5. mailto:user@lists.neo4j.org 6. mailto:rick.bullo...@burningskysoftware.com 7. mailto:User@lists.neo4j.org 8. https://lists.neo4j.org/mailman/listinfo/user 9. mailto:matt...@neotechnology.com 10. http://www.neotechnology.com/ 11. mailto:User@lists.neo4j.org 12. https://lists.neo4j.org/mailman/listinfo/user 13. mailto:matt...@neotechnology.com 14. mailto:user@lists.neo4j.org 15. mailto:rick.bullo...@burningskysoftware.com 16. mailto:User@lists.neo4j.org 17. https://lists.neo4j.org/mailman/listinfo/user 18. mailto:matt...@neotechnology.com 19. http://www.neotechnology.com/ 20. mailto:User@lists.neo4j.org 21. https://lists.neo4j.org/mailman/listinfo/user 22. mailto:User@lists.neo4j.org 23. https://lists.neo4j.org/mailman/listinfo/user 24. mailto:matt...@neotechnology.com 25. http://www.neotechnology.com/ 26.
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Removing HTML markup is not a trivial task, but luckily, the Apache Solr team has already created additional analyzers for Lucene that do what I need (the analysis package in solr has a lot of really good stuff in it); I will still need some help from the Neo team to understand how use a specific analyzer instead of the default one... Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Morten Barklund [1]mor...@barklund.dk Date: Wed, September 15, 2010 12:29 pm To: Neo4j user discussions [2]u...@lists.neo4j.org Hi I might be overly simplistic here, but why not lowercase the text, remove html markup, then remove all non-word-or-space-characters, store this as the stripped version of the text on the node (for de-indexing) and index this? /Barklund On Wed, Sep 15, 2010 at 18:07, [3]rick.bullo...@burningskysoftware.com wrote: Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 [3][6]rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4][7]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2][5][8]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3][6][9]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4][7][10]u...@lists.neo4j.org [5][8][11]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6][9][12]matt...@neotechnology.com] Hacker, Neo Technology [7][10][13]www.neotechnology.com ___ Neo4j mailing list [8][11][14]u...@lists.neo4j.org [9][12][15]https://lists.neo4j.org/mailman/listinfo/user References 1. [13][16]mailto:matt...@neotechnology.com 2. [14][17]mailto:user@lists.neo4j.org 3. [15][18]mailto:rick.bullo...@burningskysoftware.com 4. [16][19]mailto:User@lists.neo4j.org 5. [17][20]https://lists.neo4j.org/mailman/listinfo/user 6. [18][21]mailto:matt...@neotechnology.com 7. [19][22]http://www.neotechnology.com/ 8. [20][23]mailto:User@lists.neo4j.org 9. [21][24]https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list [22][25]u...@lists.neo4j.org [23][26]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[24][27]matt...@neotechnology.com] Hacker, Neo Technology [25][28]www.neotechnology.com ___ Neo4j mailing list [26][29]u...@lists.neo4j.org [27][30]https://lists.neo4j.org/mailman/listinfo/user References 1. [31]mailto:matt...@neotechnology.com 2. [32]mailto:user@lists.neo4j.org 3.
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Seems like neo4j-index-1.1 is using a lowercase whitespace tokenizer. Lucene's StandardTokenizer splits on punctuation (for specifics see http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/2.9.2/org/apache/lucene/analysis/standard/StandardTokenizer.java?av=f). I think you could strip the HTML markup, then use the StandardTokenizer instead of WhitespaceAnalyzer. -- Toby Matejovsky On Wed, Sep 15, 2010 at 12:07 PM, rick.bullo...@burningskysoftware.comwrote: Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 [3]rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3][6]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4][7]u...@lists.neo4j.org [5][8]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6][9]matt...@neotechnology.com] Hacker, Neo Technology [7][10]www.neotechnology.com ___ Neo4j mailing list [8][11]u...@lists.neo4j.org [9][12]https://lists.neo4j.org/mailman/listinfo/user References 1. [13]mailto:matt...@neotechnology.com 2. [14]mailto:user@lists.neo4j.org 3. [15]mailto:rick.bullo...@burningskysoftware.com 4. [16]mailto:User@lists.neo4j.org 5. [17]https://lists.neo4j.org/mailman/listinfo/user 6. [18]mailto:matt...@neotechnology.com 7. [19]http://www.neotechnology.com/ 8. [20]mailto:User@lists.neo4j.org 9. [21]https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list [22]u...@lists.neo4j.org [23]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[24]matt...@neotechnology.com] Hacker, Neo Technology [25]www.neotechnology.com ___ Neo4j mailing list [26]u...@lists.neo4j.org [27]https://lists.neo4j.org/mailman/listinfo/user References 1. mailto:matt...@neotechnology.com 2. mailto:user@lists.neo4j.org 3. mailto:rick.bullo...@burningskysoftware.com 4. mailto:matt...@neotechnology.com 5. mailto:user@lists.neo4j.org 6. mailto:rick.bullo...@burningskysoftware.com 7. mailto:User@lists.neo4j.org 8. https://lists.neo4j.org/mailman/listinfo/user 9. mailto:matt...@neotechnology.com 10. http://www.neotechnology.com/ 11. mailto:User@lists.neo4j.org 12. https://lists.neo4j.org/mailman/listinfo/user 13. mailto:matt...@neotechnology.com 14. mailto:user@lists.neo4j.org 15. mailto:rick.bullo...@burningskysoftware.com 16. mailto:User@lists.neo4j.org 17. https://lists.neo4j.org/mailman/listinfo/user 18. mailto:matt...@neotechnology.com 19. http://www.neotechnology.com/ 20. mailto:User@lists.neo4j.org 21. https://lists.neo4j.org/mailman/listinfo/user 22.
Re: [Neo4j] Using the REST neo4j
On Wed, Sep 15, 2010 at 5:37 PM, Jacob Hansson ja...@voltvoodoo.com wrote: Hi Alex, sorry this response took me so long, see responses inline! 2010/9/10 Alexandru Popescu ☀ the.mindstorm.mailingl...@gmail.com On Tuesday, September 7, 2010, Jacob Hansson ja...@voltvoodoo.com wrote: 2010/9/6 Alexandru Popescu ☀ the.mindstorm.mailingl...@gmail.com On Monday, September 6, 2010, Jim Webber j...@webber.name wrote: Hi Alex, While I still can achieve all these with the current packaging, it feels more hacky: I need to create a new Jetty6BasedWebServer or modify the existing one to enhance it with my own stuff. Each change would require compiling and repackaging the whole neo4j-rest. Definitely not as easy as dropping in my own jar and a new web.xml. That's an interesting point. In a sense, the neo-rest package is Neo's REST package. Interesting... My main question is: what exactly is this package offering to the end user in the current form? IMO it cannot be an off-the-shelf product as there is no security. It is not a library either, as extending it is not so easy. Basically, and without any intention to harm any feelings, it looks like one of those dummy web UI interface to X. And I'd say it has much more potential than that! Jacob, I must confess I'm totally confused by your comments below. I've always seen it as the beginnings of a proper stand-alone neo4j server. If it is the beginning, then what comes next? And more importantly from whom? Basically my proposal was meant to make things easier for people to built on top of it, so I'm not really sure how you see the continuation of it. There is lots of cool things that can be done next. Both continuing to extend the functionality of the REST server, but also (and more importantly) to look at and help out with the work being done on clients for the server in various languages. For instance I'd love to see simple-to-use ORMs on top of the python and php clients, enabling web developers to start building stuff with Neo4j as their database tier. Just to clarify (trying to keep my answers less confusing :) ): To answer the last portion of your question, I think making neo4j REST more extensible is great (see my last answer). However, I read your initial proposal as an argument for viewing neo4j-rest as a library and for switching to a WAR packaging model. Both of which I strongly oppose *if* it is done at the expense of the stand alone application. A REST/JSON API to Neo4j opens up for remote clients in any language, and would be an important part in matching offerings from other database vendors. While extendability is a great thing, building it as a library and/or packaging it as a WAR makes it very java-centric. Currently the neo4j-rest is distributed as a java application. So it is java-centric. What makes it attractive is that it allows using the HTTP protocol. Providing neo4j-rest as both a self contained app and as a web app will give you exactly the same benefits, with additional freedom on choosing how to use it, what servers to deploy it too, etc. True, the neo4j-rest project is, but I see the biggest potential of the REST project in it's stand-alone version, neo4j-rest-standalone. The reason for that is precisely what I mentioned before, the fact that it is not java centric. Distributing both as a WAR and as a stand-alone application sounds like a great solution! Like you say, there is no security, and I agree it is currently the main culprit stopping neo4j REST from production use. This can of course be offset with firewalling etc, but I couldn't agree more of the importance of a proper security layer. Security was used as a basic example of things that could be much easier to be added on top of the neo4j-rest if provided in a simpler format. As you probably know already firewalls will give you at most a very basic sort of authentication, but nothing else. As far as UI interface to X goes, the area to focus on I think is the JSON part of the API. With that, a UI can be built in any language. Take a look at http://github.com/neo4j/webadmin for a more powerful browsing UI for neo4j REST. I think you mis-read my post. I'm not looking for a nice UI, but rather for a basis to further build REST services on top of a neo4j db. As Jim mentioned in his posts, currently neo4j-rest is just exposing the basics of a neo4j db. It's true that the functionality exposed by neo4j-rest so far is fairly basic. There are several important parts of the neo4j API that should be exposed via REST (like transactions). I don't see that as an argument to make neo4j-rest more extendable though, as I feel these core items should be added the same way the data browsing, index and traversal APIs have been added. That said - extendability would be a great thing, and there are ways to make neo4j-rest much more accessible than
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Actually, I ended up coming with a workaround that involved using HTMLStripReader/HTMLStripCharFilter for pre-parsing the text before passing it into the neo .index(node,key,value) method. Works great, though there's a little extra string allocation involved. It won't be invoked often, so it isn't a big concern. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Toby Matejovsky Sent: Wednesday, September 15, 2010 12:57 PM To: Neo4j user discussions Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term This is probably what you just found, but for others: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCh arFilterFactory -- Toby Matejovsky On Wed, Sep 15, 2010 at 12:49 PM, rick.bullo...@burningskysoftware.comwrote: Removing HTML markup is not a trivial task, but luckily, the Apache Solr team has already created additional analyzers for Lucene that do what I need (the analysis package in solr has a lot of really good stuff in it); I will still need some help from the Neo team to understand how use a specific analyzer instead of the default one... Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Morten Barklund [1]mor...@barklund.dk Date: Wed, September 15, 2010 12:29 pm To: Neo4j user discussions [2]u...@lists.neo4j.org Hi I might be overly simplistic here, but why not lowercase the text, remove html markup, then remove all non-word-or-space-characters, store this as the stripped version of the text on the node (for de-indexing) and index this? /Barklund On Wed, Sep 15, 2010 at 18:07, [3]rick.bullo...@burningskysoftware.com wrote: Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 [3][6]rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4][7]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2][5][8]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3][6][9]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick ___ Neo4j mailing list [4][7][10]u...@lists.neo4j.org [5][8][11]https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [[6][9][12]matt...@neotechnology.com] Hacker, Neo Technology [7][10][13]www.neotechnology.com ___ Neo4j mailing list [8][11][14]u...@lists.neo4j.org [9][12][15]https://lists.neo4j.org/mailman/listinfo/user References 1. [13][16]mailto:matt...@neotechnology.com 2. [14][17]mailto:user@lists.neo4j.org 3. [15][18]mailto:rick.bullo...@burningskysoftware.com 4. [16][19]mailto:User@lists.neo4j.org 5.
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Well, I have it implemented, and it is cleaning up the content, but the standard Lucene analyzer still isn't working correctly. Random words are completely ignored with no special markup in the content, sometimes words are combined, punctuation is never removed, etc.. Something is really wrong, IMO. Does anyone know of a way to dump out what the Lucene tokenizer is generating in terms of splitting the text into tokens/words? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Rick Bullotta Sent: Wednesday, September 15, 2010 2:23 PM To: 'Neo4j user discussions' Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term Actually, I ended up coming with a workaround that involved using HTMLStripReader/HTMLStripCharFilter for pre-parsing the text before passing it into the neo .index(node,key,value) method. Works great, though there's a little extra string allocation involved. It won't be invoked often, so it isn't a big concern. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Toby Matejovsky Sent: Wednesday, September 15, 2010 12:57 PM To: Neo4j user discussions Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term This is probably what you just found, but for others: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCh arFilterFactory -- Toby Matejovsky On Wed, Sep 15, 2010 at 12:49 PM, rick.bullo...@burningskysoftware.comwrote: Removing HTML markup is not a trivial task, but luckily, the Apache Solr team has already created additional analyzers for Lucene that do what I need (the analysis package in solr has a lot of really good stuff in it); I will still need some help from the Neo team to understand how use a specific analyzer instead of the default one... Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Morten Barklund [1]mor...@barklund.dk Date: Wed, September 15, 2010 12:29 pm To: Neo4j user discussions [2]u...@lists.neo4j.org Hi I might be overly simplistic here, but why not lowercase the text, remove html markup, then remove all non-word-or-space-characters, store this as the stripped version of the text on the node (for de-indexing) and index this? /Barklund On Wed, Sep 15, 2010 at 18:07, [3]rick.bullo...@burningskysoftware.com wrote: Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe the last word isn't good, it's good. with a dot. I know that has messed up some searches for me at least. You could perhaps override the implementation and instantiate an Analyzer/Tokenizer which gets rid of such punctuation characters? 2010/9/15 [3][6]rick.bullo...@burningskysoftware.com Using neo4j-index-1.1 and lucene-core-2.9.2, by the way. Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4][7]matt...@neotechnology.com Date: Wed, September 15, 2010 10:37 am To: Neo4j user discussions [2][5][8]u...@lists.neo4j.org That sounds weird. Look at TestLuceneFulltextIndexService#testSimpleFulltext method, it queries for the last word and it seems to work. Could you provide more info on this? 2010/9/15 [3][6][9]rick.bullo...@burningskysoftware.com I've noticed that when indexing full text, the last term/word is always ignored. This is a major issue, but I'm not sure if it is in the index utils or in Lucene itself. Any thoughts? Thanks, Rick
Re: [Neo4j] Possible functional gap in Lucene indexing?
I tried something similar but went block when I couldn't find a way to retrieve indexes stored for a node so I'm wondering if Lucene can do that with a decent performance... Don't know if it can retrieve a relationship like indexed fields - node. Anyone knows if is that possible? On Wed, Sep 15, 2010 at 9:03 AM, rick.bullo...@burningskysoftware.comwrote: Hi, all. We're trying to use Lucene for fulltext indexing of some textual content that is stored in Neo, and we've hit a bit of a roadblock. In some cases, that content will be updated/edited and/or nodes will be removed, but the process by which index information is removed seems awkward. In particular, it would seem that a removeIndex(Node node) method would be extremely helpful for removing all indexes on a particular node. The current method requires retrieving and passing in the original textual content so that the node can be de-indexed. Is there any solution that would allow index removal given only a Node? Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term
Solr's LowerCaseTokenizer drops all non-letters (contrast with Lucene's LowerCaseFilter which just lowercases letters and doesn't drop anything). See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory Might be worth trying to tokenize with that after stripping HTML and possibly after using the StandardFilter (so apostrophed words are togther, e.g. they're ends up theyre instead of [ they, re ] Regarding the random combined words, my *guess* is that the HTMLStripCharFilter does something like this: pFoobr/Bar/p === FooBar instead of Foo Bar. -- Toby Matejovsky On Wed, Sep 15, 2010 at 2:53 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Looking at the StandardAnalyzer/Tokenizer (which use JFlex internally), it appears that the grammar used by the parser doesn't consider ? and ! as punctuation! Grrr. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Rick Bullotta Sent: Wednesday, September 15, 2010 2:44 PM To: 'Neo4j user discussions' Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term Well, I have it implemented, and it is cleaning up the content, but the standard Lucene analyzer still isn't working correctly. Random words are completely ignored with no special markup in the content, sometimes words are combined, punctuation is never removed, etc.. Something is really wrong, IMO. Does anyone know of a way to dump out what the Lucene tokenizer is generating in terms of splitting the text into tokens/words? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Rick Bullotta Sent: Wednesday, September 15, 2010 2:23 PM To: 'Neo4j user discussions' Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term Actually, I ended up coming with a workaround that involved using HTMLStripReader/HTMLStripCharFilter for pre-parsing the text before passing it into the neo .index(node,key,value) method. Works great, though there's a little extra string allocation involved. It won't be invoked often, so it isn't a big concern. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Toby Matejovsky Sent: Wednesday, September 15, 2010 12:57 PM To: Neo4j user discussions Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term This is probably what you just found, but for others: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCh arFilterFactory -- Toby Matejovsky On Wed, Sep 15, 2010 at 12:49 PM, rick.bullo...@burningskysoftware.comwrote: Removing HTML markup is not a trivial task, but luckily, the Apache Solr team has already created additional analyzers for Lucene that do what I need (the analysis package in solr has a lot of really good stuff in it); I will still need some help from the Neo team to understand how use a specific analyzer instead of the default one... Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Morten Barklund [1]mor...@barklund.dk Date: Wed, September 15, 2010 12:29 pm To: Neo4j user discussions [2]u...@lists.neo4j.org Hi I might be overly simplistic here, but why not lowercase the text, remove html markup, then remove all non-word-or-space-characters, store this as the stripped version of the text on the node (for de-indexing) and index this? /Barklund On Wed, Sep 15, 2010 at 18:07, [3]rick.bullo...@burningskysoftware.com wrote: Actually, it seems like a deeper bug/design flaw in Lucene's analyzer/tokenizer. The actual text is HTML text, with p and /p wrappers. Lucene somewhat randomly seems to treat the last two words as a single token, and in other cases ignore it altogether. The dot character screws it up even more, because even if it tokenizes with the dot character, you can't query with it (or at least nothing gets returned). Hmmm. I really don't want to have to write a tokenizer/analyzer if I can avoid it. Seems like a LOT of work. Do you have any example code of a custom tokenizer/analyzer we could start from? Thanks, Rick Original Message Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring last word/term From: Mattias Persson [1][4]matt...@neotechnology.com Date: Wed, September 15, 2010 11:47 am To: Neo4j user discussions [2][5]u...@lists.neo4j.org Couldn't it be that sentences ends with a dot... so Cheese is good. will index the words: [Cheese, is, good.] ? Observe