Re: [Neo4j] problem with neography and index.

Michael Hunger Fri, 01 Jul 2011 05:28:06 -0700

You have a few options here:

* paging is right now only supported in the REST API for traversals the other 
request types will get it in 1.5 (so you could use the paging functionality for 
your traverser, don't know if Max supports that already in neography)
* you could use either the cypher 
(http://docs.neo4j.org/chunked/snapshot/cypher-plugin.html) or the gremlin 
plugin (http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html)  to write a 
request against their "execute_script" api which then does the index lookup + 
limit


* you can write your own server plugin doing that 
http://docs.neo4j.org/chunked/snapshot/server-plugins.html

Cheers

Michael

Am 01.07.2011 um 14:04 schrieb Laurent Laborde:

> the ruby crash when i request all the page with parsed = false
> using directly the REST interface with CURL : the result is a huge
> json with ~10.000 nodes
> 
> is there a way to limit the result size, like a SQL "SELECT * from
> node where parsed == 'true' limit 100;" ?
> i tried using a traverser instead of requesting the index :
> 
> node_to_parse = @neo.traverse(ob_root_node, "nodes", { "relationships"
> => [{"type"=> "link", "direction" => "out" }], "prune evaluator" =>
> {"language" => "javascript", "body" =>
> "position.endNode().getProperty('parsed') == 'false';"}, "return
> filter" => {"language" => "builtin",  "name" => "all but start
> node"}})
> 
> 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:140:in
> `rescue in rbuf_fill': Timeout::Error (Timeout::Error)
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:134:in
> `rbuf_fill'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:116:in
> `readuntil'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:126:in
> `readline'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:2219:in
> `read_status_line'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:2208:in
> `read_new'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1191:in
> `transport_request'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1177:in
> `request'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1170:in
> `block in request'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:627:in
> `start'
>       from 
> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1168:in
> `request'
>       from 
> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty/request.rb:69:in
> `perform'
>       from 
> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty.rb:390:in
> `perform_request'
>       from 
> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty.rb:358:in
> `post'
>       from 
> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty.rb:426:in
> `post'
>       from 
> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/neography-0.0.13/lib/neography/rest.rb:363:in
> `post'
>       from 
> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/neography-0.0.13/lib/neography/rest.rb:317:in
> `traverse'
>       from nokogiri-test.rb:26:in `<main>'
> 
> 
> -- 
> Keru
> 
> On Fri, Jul 1, 2011 at 1:15 PM, Michael Hunger
> <[email protected]> wrote:
>> Can you call the index REST request manually and see what it returns?
>> 
>> see here 
>> http://components.neo4j.org/neo4j-server/snapshot/rest.html#Index_search_-_Exact_keyvalue_lookup
>> 
>> curl -H Accept:application/json 
>> http://localhost:7474/db/data/index/node/my_nodes/the_key/the_value%20with%20space
>> 
>> see here: 
>> http://stackoverflow.com/questions/547127/in-ruby-how-do-i-replace-the-question-mark-character-in-a-string
>> 
>> require "addressable/uri"
>> Addressable::URI.encode_component("http://test.com?test/test%test",Addressable::URI::CharacterClasses::PATH)
>> 
>> Cheers
>> 
>> Michael
>> 
>> 
>> Am 01.07.2011 um 11:23 schrieb Laurent Laborde:
>> 
>>> After a few run (and more and more and more page to crawl) it seems
>>> that the result returned by the index is too big :
>>> 
>>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/crack-0.1.8/lib/crack/json.rb:54:
>>> stack level too deep (SystemStackError)
>>> 
>>> Any idea ? workaround ?
>>> 
>>> thank you
>>> 
>>> --
>>> Ker2x
>>> 
>>> On Fri, Jul 1, 2011 at 10:44 AM, Laurent Laborde <[email protected]> 
>>> wrote:
>>>> I  used Base64.encode64 instead ! it still didn't worked.
>>>> So i used Base64.encode64 and get_node_index instead of
>>>> find_node_index and it worked \o/
>>>> 
>>>> --
>>>> Ker2x
>>>> 
>>>> On Fri, Jul 1, 2011 at 10:25 AM, Laurent Laborde <[email protected]> 
>>>> wrote:
>>>>> thank you for your help.
>>>>> as you probably noticed i'm not good with ruby (i'm a sysadmin ^^)
>>>>> 
>>>>> i tried using URI.encode but it doesn't works as expected.
>>>>> 
>>>>> irb(main):001:0> require 'uri'
>>>>> => true
>>>>> irb(main):002:0> puts URI.escape("http://www.over.blog.com/";)
>>>>> http://www.over.blog.com/
>>>>> => nil
>>>>> irb(main):003:0> puts URI.encode("http://www.over.blog.com/";)
>>>>> http://www.over.blog.com/
>>>>> => nil
>>>>> 
>>>>> i guess that the output sould be more like
>>>>> "http%3A%2F%2Fwww.over-blog.com%2" isn't it ?
>>>>> 
>>>>> --
>>>>> Ker2x
>>>>> 
>>>>> On Thu, Jun 30, 2011 at 6:40 PM, Michael Hunger
>>>>> <[email protected]> wrote:
>>>>>> you have to escape the url index value
>>>>>> otherwise the jersey rest framework consumes it silently. I had this 
>>>>>> problem when working on the birdies demo app. Took me a while to work 
>>>>>> that out.
>>>>>> 
>>>>>> see http://github.com/jexp/birdies
>>>>>> and http://birdies.heroku.com
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>> Sent from my iBrick4
>>>>>> 
>>>>>> 
>>>>>> Am 30.06.2011 um 17:43 schrieb Laurent Laborde <[email protected]>:
>>>>>> 
>>>>>>> Friendly greetings !
>>>>>>> i'm on the same problem since many days (an hour per day) and i can't
>>>>>>> find a solution
>>>>>>> i have 2 index (see source doe below)
>>>>>>> No problem with the "parsed" index, but the "url" index never return 
>>>>>>> any result.
>>>>>>> I don't if it's because the url isn't indexed or because the query on
>>>>>>> the index is wrong.
>>>>>>> Or something else ?
>>>>>>> 
>>>>>>> Could you please take a look and see what's wrong ?
>>>>>>> thank you
>>>>>>> 
>>>>>>> (you can try to run the script, it works)
>>>>>>> 
>>>>>>> require 'nokogiri'
>>>>>>> require 'open-uri'
>>>>>>> require 'neography'
>>>>>>> 
>>>>>>> #init neography
>>>>>>> @neo = Neography::Rest.new
>>>>>>> neo_root = @neo.get_root
>>>>>>> 
>>>>>>> domaine = 'http://www.over-blog.com/'
>>>>>>> parsed_idx = "ob_parsed_idx"
>>>>>>> url_idx = "ob_url_idx"
>>>>>>> 
>>>>>>> #FIRST RUN
>>>>>>> #ob_root_node = @neo.create_node("domaine" => domaine, "parsed" =>
>>>>>>> "false", "url" => domaine)
>>>>>>> #@neo.create_relationship("obgraph", neo_root, ob_root_node)
>>>>>>> #pidx = @neo.create_node_index(parsed_idx)
>>>>>>> #uidx = @neo.create_node_index(url_idx)
>>>>>>> #@neo.add_node_to_index(parsed_idx, "parsed", "false", ob_root_node)
>>>>>>> ##@neo.add_node_to_index(url_idx, "url", domaine, ob_root_node)
>>>>>>> #node_to_parse = @neo.get_node_index(parsed_idx, "parsed", "false")
>>>>>>> 
>>>>>>> ob_root_node = @neo.traverse(neo_root, "nodes", { "relationships" =>
>>>>>>> [{"type"=> "obgraph", "direction" => "out" }], "depth" => 1})
>>>>>>> #node_to_parse = @neo.traverse(ob_root_node, "nodes", {
>>>>>>> "relationships" => [{"type"=> "link", "direction" => "out" }] })
>>>>>>> node_to_parse = @neo.get_node_index(parsed_idx, "parsed", "false")
>>>>>>> 
>>>>>>> #print @neo.list_node_indexes
>>>>>>> 
>>>>>>> node_to_parse.each do |node|
>>>>>>> 
>>>>>>>    url_to_parse = @neo.get_node_properties(node)["url"]
>>>>>>>    printf("exploring : %s\n", url_to_parse)
>>>>>>> 
>>>>>>>    doc = Nokogiri::HTML(open(url_to_parse))
>>>>>>>    @neo.set_node_properties(node, {"parsed" => "true"})
>>>>>>>    @neo.remove_node_from_index(parsed_idx, node)
>>>>>>>    @neo.add_node_to_index(parsed_idx, "parsed", "true", node)
>>>>>>> 
>>>>>>>    doc.xpath('//a').each do |link|
>>>>>>> 
>>>>>>>        link_text = link.content.strip()
>>>>>>>        link_url = link['href'].to_s().strip()
>>>>>>>        link_title = link['title'].to_s().strip()
>>>>>>> 
>>>>>>>        link_url = link_url.sub(/#.*$/, "")
>>>>>>> 
>>>>>>>        if(link_url =~ /^\/.*/)
>>>>>>>            link_url = link_url.sub(/^\//, '')
>>>>>>>            link_url = domaine + link_url
>>>>>>>        end
>>>>>>> 
>>>>>>>        if(link_text == '')
>>>>>>>            link_text = link_title
>>>>>>>        end
>>>>>>> 
>>>>>>> 
>>>>>>>        #skiping empty stuff
>>>>>>>        next if link_url.empty?
>>>>>>>        next if link_text.empty?
>>>>>>> 
>>>>>>>        node_found = @neo.find_node_index(url_idx, "url", link_url)
>>>>>>>        #node_found = @neo.traverse(ob_root_node, "nodes", {
>>>>>>> "relationships" => [{"direction" => "out" }], "prune evaluator" =>
>>>>>>> {"language" => "javascript", "body" =>
>>>>>>> "position.endNode().getProperty(url) == #{link_url};"}, "return
>>>>>>> filter" => {"language" => "builtin",  "name" => "all but start
>>>>>>> node"}})
>>>>>>>        print "\nsearching url #{link_url}\n"
>>>>>>>        printf("node_found : %s \n", node_found)
>>>>>>>        if(node_found.nil?)
>>>>>>>            printf("create node %s\n", link_url)
>>>>>>>            nnode = @neo.create_node("parsed" => "false", "url" => 
>>>>>>> link_url)
>>>>>>>            @neo.add_node_to_index(url_idx, "url", link_url, nnode)
>>>>>>>            @neo.add_node_to_index(parsed_idx, "parsed", "false", nnode)
>>>>>>>        else
>>>>>>>            printf("node_found : %s \n", node_found)
>>>>>>>        end
>>>>>>> 
>>>>>>> 
>>>>>>>        nrel = @neo.create_relationship("link", node, nnode)
>>>>>>>        @neo.set_relationship_properties(nrel, {"text" => link_text})
>>>>>>> 
>>>>>>>        #printf("%s => %s\n", link_text, link_url)
>>>>>>> 
>>>>>>>    end
>>>>>>> 
>>>>>>>    sleep(1.0)
>>>>>>> 
>>>>>>> 
>>>>>>> end
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Laurent "ker2x" Laborde
>>>>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>>>>> _______________________________________________
>>>>>>> Neo4j mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>> _______________________________________________
>>>>>> Neo4j mailing list
>>>>>> [email protected]
>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Laurent "ker2x" Laborde
>>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Laurent "ker2x" Laborde
>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Laurent "ker2x" Laborde
>>> Sysadmin & DBA at http://www.over-blog.com/
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> 
> 
> 
> -- 
> Laurent "ker2x" Laborde
> Sysadmin & DBA at http://www.over-blog.com/
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] problem with neography and index.

Reply via email to