Re: [Neo4j] problem with neography and index.

Max De Marzi Jr. Fri, 01 Jul 2011 15:35:38 -0700

I don't have paging in yet... I've been slacking I know.  I'll get to it soon.


On Fri, Jul 1, 2011 at 7:27 AM, Michael Hunger
<[email protected]> wrote:
> You have a few options here:
>
> * paging is right now only supported in the REST API for traversals the other 
> request types will get it in 1.5 (so you could use the paging functionality 
> for your traverser, don't know if Max supports that already in neography)
> * you could use either the cypher 
> (http://docs.neo4j.org/chunked/snapshot/cypher-plugin.html) or the gremlin 
> plugin (http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html)  to write 
> a request against their "execute_script" api which then does the index lookup 
> + limit
>
> * you can write your own server plugin doing that 
> http://docs.neo4j.org/chunked/snapshot/server-plugins.html
>
> Cheers
>
> Michael
>
> Am 01.07.2011 um 14:04 schrieb Laurent Laborde:
>
>> the ruby crash when i request all the page with parsed = false
>> using directly the REST interface with CURL : the result is a huge
>> json with ~10.000 nodes
>>
>> is there a way to limit the result size, like a SQL "SELECT * from
>> node where parsed == 'true' limit 100;" ?
>> i tried using a traverser instead of requesting the index :
>>
>> node_to_parse = @neo.traverse(ob_root_node, "nodes", { "relationships"
>> => [{"type"=> "link", "direction" => "out" }], "prune evaluator" =>
>> {"language" => "javascript", "body" =>
>> "position.endNode().getProperty('parsed') == 'false';"}, "return
>> filter" => {"language" => "builtin",  "name" => "all but start
>> node"}})
>>
>>
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:140:in
>> `rescue in rbuf_fill': Timeout::Error (Timeout::Error)
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:134:in
>> `rbuf_fill'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:116:in
>> `readuntil'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/protocol.rb:126:in
>> `readline'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:2219:in
>> `read_status_line'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:2208:in
>> `read_new'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1191:in
>> `transport_request'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1177:in
>> `request'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1170:in
>> `block in request'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:627:in
>> `start'
>>       from 
>> /home/ker2x/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/http.rb:1168:in
>> `request'
>>       from 
>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty/request.rb:69:in
>> `perform'
>>       from 
>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty.rb:390:in
>> `perform_request'
>>       from 
>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty.rb:358:in
>> `post'
>>       from 
>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/httparty-0.7.8/lib/httparty.rb:426:in
>> `post'
>>       from 
>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/neography-0.0.13/lib/neography/rest.rb:363:in
>> `post'
>>       from 
>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/neography-0.0.13/lib/neography/rest.rb:317:in
>> `traverse'
>>       from nokogiri-test.rb:26:in `<main>'
>>
>>
>> --
>> Keru
>>
>> On Fri, Jul 1, 2011 at 1:15 PM, Michael Hunger
>> <[email protected]> wrote:
>>> Can you call the index REST request manually and see what it returns?
>>>
>>> see here 
>>> http://components.neo4j.org/neo4j-server/snapshot/rest.html#Index_search_-_Exact_keyvalue_lookup
>>>
>>> curl -H Accept:application/json 
>>> http://localhost:7474/db/data/index/node/my_nodes/the_key/the_value%20with%20space
>>>
>>> see here: 
>>> http://stackoverflow.com/questions/547127/in-ruby-how-do-i-replace-the-question-mark-character-in-a-string
>>>
>>> require "addressable/uri"
>>> Addressable::URI.encode_component("http://test.com?test/test%test",Addressable::URI::CharacterClasses::PATH)
>>>
>>> Cheers
>>>
>>> Michael
>>>
>>>
>>> Am 01.07.2011 um 11:23 schrieb Laurent Laborde:
>>>
>>>> After a few run (and more and more and more page to crawl) it seems
>>>> that the result returned by the index is too big :
>>>>
>>>> /home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/crack-0.1.8/lib/crack/json.rb:54:
>>>> stack level too deep (SystemStackError)
>>>>
>>>> Any idea ? workaround ?
>>>>
>>>> thank you
>>>>
>>>> --
>>>> Ker2x
>>>>
>>>> On Fri, Jul 1, 2011 at 10:44 AM, Laurent Laborde <[email protected]> 
>>>> wrote:
>>>>> I  used Base64.encode64 instead ! it still didn't worked.
>>>>> So i used Base64.encode64 and get_node_index instead of
>>>>> find_node_index and it worked \o/
>>>>>
>>>>> --
>>>>> Ker2x
>>>>>
>>>>> On Fri, Jul 1, 2011 at 10:25 AM, Laurent Laborde <[email protected]> 
>>>>> wrote:
>>>>>> thank you for your help.
>>>>>> as you probably noticed i'm not good with ruby (i'm a sysadmin ^^)
>>>>>>
>>>>>> i tried using URI.encode but it doesn't works as expected.
>>>>>>
>>>>>> irb(main):001:0> require 'uri'
>>>>>> => true
>>>>>> irb(main):002:0> puts URI.escape("http://www.over.blog.com/";)
>>>>>> http://www.over.blog.com/
>>>>>> => nil
>>>>>> irb(main):003:0> puts URI.encode("http://www.over.blog.com/";)
>>>>>> http://www.over.blog.com/
>>>>>> => nil
>>>>>>
>>>>>> i guess that the output sould be more like
>>>>>> "http%3A%2F%2Fwww.over-blog.com%2" isn't it ?
>>>>>>
>>>>>> --
>>>>>> Ker2x
>>>>>>
>>>>>> On Thu, Jun 30, 2011 at 6:40 PM, Michael Hunger
>>>>>> <[email protected]> wrote:
>>>>>>> you have to escape the url index value
>>>>>>> otherwise the jersey rest framework consumes it silently. I had this 
>>>>>>> problem when working on the birdies demo app. Took me a while to work 
>>>>>>> that out.
>>>>>>>
>>>>>>> see http://github.com/jexp/birdies
>>>>>>> and http://birdies.heroku.com
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>> Sent from my iBrick4
>>>>>>>
>>>>>>>
>>>>>>> Am 30.06.2011 um 17:43 schrieb Laurent Laborde <[email protected]>:
>>>>>>>
>>>>>>>> Friendly greetings !
>>>>>>>> i'm on the same problem since many days (an hour per day) and i can't
>>>>>>>> find a solution
>>>>>>>> i have 2 index (see source doe below)
>>>>>>>> No problem with the "parsed" index, but the "url" index never return 
>>>>>>>> any result.
>>>>>>>> I don't if it's because the url isn't indexed or because the query on
>>>>>>>> the index is wrong.
>>>>>>>> Or something else ?
>>>>>>>>
>>>>>>>> Could you please take a look and see what's wrong ?
>>>>>>>> thank you
>>>>>>>>
>>>>>>>> (you can try to run the script, it works)
>>>>>>>>
>>>>>>>> require 'nokogiri'
>>>>>>>> require 'open-uri'
>>>>>>>> require 'neography'
>>>>>>>>
>>>>>>>> #init neography
>>>>>>>> @neo = Neography::Rest.new
>>>>>>>> neo_root = @neo.get_root
>>>>>>>>
>>>>>>>> domaine = 'http://www.over-blog.com/'
>>>>>>>> parsed_idx = "ob_parsed_idx"
>>>>>>>> url_idx = "ob_url_idx"
>>>>>>>>
>>>>>>>> #FIRST RUN
>>>>>>>> #ob_root_node = @neo.create_node("domaine" => domaine, "parsed" =>
>>>>>>>> "false", "url" => domaine)
>>>>>>>> #@neo.create_relationship("obgraph", neo_root, ob_root_node)
>>>>>>>> #pidx = @neo.create_node_index(parsed_idx)
>>>>>>>> #uidx = @neo.create_node_index(url_idx)
>>>>>>>> #@neo.add_node_to_index(parsed_idx, "parsed", "false", ob_root_node)
>>>>>>>> ##@neo.add_node_to_index(url_idx, "url", domaine, ob_root_node)
>>>>>>>> #node_to_parse = @neo.get_node_index(parsed_idx, "parsed", "false")
>>>>>>>>
>>>>>>>> ob_root_node = @neo.traverse(neo_root, "nodes", { "relationships" =>
>>>>>>>> [{"type"=> "obgraph", "direction" => "out" }], "depth" => 1})
>>>>>>>> #node_to_parse = @neo.traverse(ob_root_node, "nodes", {
>>>>>>>> "relationships" => [{"type"=> "link", "direction" => "out" }] })
>>>>>>>> node_to_parse = @neo.get_node_index(parsed_idx, "parsed", "false")
>>>>>>>>
>>>>>>>> #print @neo.list_node_indexes
>>>>>>>>
>>>>>>>> node_to_parse.each do |node|
>>>>>>>>
>>>>>>>>    url_to_parse = @neo.get_node_properties(node)["url"]
>>>>>>>>    printf("exploring : %s\n", url_to_parse)
>>>>>>>>
>>>>>>>>    doc = Nokogiri::HTML(open(url_to_parse))
>>>>>>>>    @neo.set_node_properties(node, {"parsed" => "true"})
>>>>>>>>    @neo.remove_node_from_index(parsed_idx, node)
>>>>>>>>    @neo.add_node_to_index(parsed_idx, "parsed", "true", node)
>>>>>>>>
>>>>>>>>    doc.xpath('//a').each do |link|
>>>>>>>>
>>>>>>>>        link_text = link.content.strip()
>>>>>>>>        link_url = link['href'].to_s().strip()
>>>>>>>>        link_title = link['title'].to_s().strip()
>>>>>>>>
>>>>>>>>        link_url = link_url.sub(/#.*$/, "")
>>>>>>>>
>>>>>>>>        if(link_url =~ /^\/.*/)
>>>>>>>>            link_url = link_url.sub(/^\//, '')
>>>>>>>>            link_url = domaine + link_url
>>>>>>>>        end
>>>>>>>>
>>>>>>>>        if(link_text == '')
>>>>>>>>            link_text = link_title
>>>>>>>>        end
>>>>>>>>
>>>>>>>>
>>>>>>>>        #skiping empty stuff
>>>>>>>>        next if link_url.empty?
>>>>>>>>        next if link_text.empty?
>>>>>>>>
>>>>>>>>        node_found = @neo.find_node_index(url_idx, "url", link_url)
>>>>>>>>        #node_found = @neo.traverse(ob_root_node, "nodes", {
>>>>>>>> "relationships" => [{"direction" => "out" }], "prune evaluator" =>
>>>>>>>> {"language" => "javascript", "body" =>
>>>>>>>> "position.endNode().getProperty(url) == #{link_url};"}, "return
>>>>>>>> filter" => {"language" => "builtin",  "name" => "all but start
>>>>>>>> node"}})
>>>>>>>>        print "\nsearching url #{link_url}\n"
>>>>>>>>        printf("node_found : %s \n", node_found)
>>>>>>>>        if(node_found.nil?)
>>>>>>>>            printf("create node %s\n", link_url)
>>>>>>>>            nnode = @neo.create_node("parsed" => "false", "url" => 
>>>>>>>> link_url)
>>>>>>>>            @neo.add_node_to_index(url_idx, "url", link_url, nnode)
>>>>>>>>            @neo.add_node_to_index(parsed_idx, "parsed", "false", nnode)
>>>>>>>>        else
>>>>>>>>            printf("node_found : %s \n", node_found)
>>>>>>>>        end
>>>>>>>>
>>>>>>>>
>>>>>>>>        nrel = @neo.create_relationship("link", node, nnode)
>>>>>>>>        @neo.set_relationship_properties(nrel, {"text" => link_text})
>>>>>>>>
>>>>>>>>        #printf("%s => %s\n", link_text, link_url)
>>>>>>>>
>>>>>>>>    end
>>>>>>>>
>>>>>>>>    sleep(1.0)
>>>>>>>>
>>>>>>>>
>>>>>>>> end
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Laurent "ker2x" Laborde
>>>>>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>>>>>> _______________________________________________
>>>>>>>> Neo4j mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>> _______________________________________________
>>>>>>> Neo4j mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Laurent "ker2x" Laborde
>>>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Laurent "ker2x" Laborde
>>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Laurent "ker2x" Laborde
>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> [email protected]
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>
>>
>>
>> --
>> Laurent "ker2x" Laborde
>> Sysadmin & DBA at http://www.over-blog.com/
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] problem with neography and index.

Reply via email to