Re: [Neo4j] problem with neography and index.

Laurent Laborde Fri, 01 Jul 2011 02:23:28 -0700

After a few run (and more and more and more page to crawl) it seems
that the result returned by the index is too big :


/home/ker2x/.rvm/gems/ruby-1.9.2-p180/gems/crack-0.1.8/lib/crack/json.rb:54:
stack level too deep (SystemStackError)

Any idea ? workaround ?

thank you

-- 
Ker2x

On Fri, Jul 1, 2011 at 10:44 AM, Laurent Laborde <[email protected]> wrote:
> I  used Base64.encode64 instead ! it still didn't worked.
> So i used Base64.encode64 and get_node_index instead of
> find_node_index and it worked \o/
>
> --
> Ker2x
>
> On Fri, Jul 1, 2011 at 10:25 AM, Laurent Laborde <[email protected]> wrote:
>> thank you for your help.
>> as you probably noticed i'm not good with ruby (i'm a sysadmin ^^)
>>
>> i tried using URI.encode but it doesn't works as expected.
>>
>> irb(main):001:0> require 'uri'
>> => true
>> irb(main):002:0> puts URI.escape("http://www.over.blog.com/";)
>> http://www.over.blog.com/
>> => nil
>> irb(main):003:0> puts URI.encode("http://www.over.blog.com/";)
>> http://www.over.blog.com/
>> => nil
>>
>> i guess that the output sould be more like
>> "http%3A%2F%2Fwww.over-blog.com%2" isn't it ?
>>
>> --
>> Ker2x
>>
>> On Thu, Jun 30, 2011 at 6:40 PM, Michael Hunger
>> <[email protected]> wrote:
>>> you have to escape the url index value
>>> otherwise the jersey rest framework consumes it silently. I had this 
>>> problem when working on the birdies demo app. Took me a while to work that 
>>> out.
>>>
>>> see http://github.com/jexp/birdies
>>> and http://birdies.heroku.com
>>>
>>> Michael
>>>
>>> Sent from my iBrick4
>>>
>>>
>>> Am 30.06.2011 um 17:43 schrieb Laurent Laborde <[email protected]>:
>>>
>>>> Friendly greetings !
>>>> i'm on the same problem since many days (an hour per day) and i can't
>>>> find a solution
>>>> i have 2 index (see source doe below)
>>>> No problem with the "parsed" index, but the "url" index never return any 
>>>> result.
>>>> I don't if it's because the url isn't indexed or because the query on
>>>> the index is wrong.
>>>> Or something else ?
>>>>
>>>> Could you please take a look and see what's wrong ?
>>>> thank you
>>>>
>>>> (you can try to run the script, it works)
>>>>
>>>> require 'nokogiri'
>>>> require 'open-uri'
>>>> require 'neography'
>>>>
>>>> #init neography
>>>> @neo = Neography::Rest.new
>>>> neo_root = @neo.get_root
>>>>
>>>> domaine = 'http://www.over-blog.com/'
>>>> parsed_idx = "ob_parsed_idx"
>>>> url_idx = "ob_url_idx"
>>>>
>>>> #FIRST RUN
>>>> #ob_root_node = @neo.create_node("domaine" => domaine, "parsed" =>
>>>> "false", "url" => domaine)
>>>> #@neo.create_relationship("obgraph", neo_root, ob_root_node)
>>>> #pidx = @neo.create_node_index(parsed_idx)
>>>> #uidx = @neo.create_node_index(url_idx)
>>>> #@neo.add_node_to_index(parsed_idx, "parsed", "false", ob_root_node)
>>>> ##@neo.add_node_to_index(url_idx, "url", domaine, ob_root_node)
>>>> #node_to_parse = @neo.get_node_index(parsed_idx, "parsed", "false")
>>>>
>>>> ob_root_node = @neo.traverse(neo_root, "nodes", { "relationships" =>
>>>> [{"type"=> "obgraph", "direction" => "out" }], "depth" => 1})
>>>> #node_to_parse = @neo.traverse(ob_root_node, "nodes", {
>>>> "relationships" => [{"type"=> "link", "direction" => "out" }] })
>>>> node_to_parse = @neo.get_node_index(parsed_idx, "parsed", "false")
>>>>
>>>> #print @neo.list_node_indexes
>>>>
>>>> node_to_parse.each do |node|
>>>>
>>>>    url_to_parse = @neo.get_node_properties(node)["url"]
>>>>    printf("exploring : %s\n", url_to_parse)
>>>>
>>>>    doc = Nokogiri::HTML(open(url_to_parse))
>>>>    @neo.set_node_properties(node, {"parsed" => "true"})
>>>>    @neo.remove_node_from_index(parsed_idx, node)
>>>>    @neo.add_node_to_index(parsed_idx, "parsed", "true", node)
>>>>
>>>>    doc.xpath('//a').each do |link|
>>>>
>>>>        link_text = link.content.strip()
>>>>        link_url = link['href'].to_s().strip()
>>>>        link_title = link['title'].to_s().strip()
>>>>
>>>>        link_url = link_url.sub(/#.*$/, "")
>>>>
>>>>        if(link_url =~ /^\/.*/)
>>>>            link_url = link_url.sub(/^\//, '')
>>>>            link_url = domaine + link_url
>>>>        end
>>>>
>>>>        if(link_text == '')
>>>>            link_text = link_title
>>>>        end
>>>>
>>>>
>>>>        #skiping empty stuff
>>>>        next if link_url.empty?
>>>>        next if link_text.empty?
>>>>
>>>>        node_found = @neo.find_node_index(url_idx, "url", link_url)
>>>>        #node_found = @neo.traverse(ob_root_node, "nodes", {
>>>> "relationships" => [{"direction" => "out" }], "prune evaluator" =>
>>>> {"language" => "javascript", "body" =>
>>>> "position.endNode().getProperty(url) == #{link_url};"}, "return
>>>> filter" => {"language" => "builtin",  "name" => "all but start
>>>> node"}})
>>>>        print "\nsearching url #{link_url}\n"
>>>>        printf("node_found : %s \n", node_found)
>>>>        if(node_found.nil?)
>>>>            printf("create node %s\n", link_url)
>>>>            nnode = @neo.create_node("parsed" => "false", "url" => link_url)
>>>>            @neo.add_node_to_index(url_idx, "url", link_url, nnode)
>>>>            @neo.add_node_to_index(parsed_idx, "parsed", "false", nnode)
>>>>        else
>>>>            printf("node_found : %s \n", node_found)
>>>>        end
>>>>
>>>>
>>>>        nrel = @neo.create_relationship("link", node, nnode)
>>>>        @neo.set_relationship_properties(nrel, {"text" => link_text})
>>>>
>>>>        #printf("%s => %s\n", link_text, link_url)
>>>>
>>>>    end
>>>>
>>>>    sleep(1.0)
>>>>
>>>>
>>>> end
>>>>
>>>>
>>>> --
>>>> Laurent "ker2x" Laborde
>>>> Sysadmin & DBA at http://www.over-blog.com/
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> [email protected]
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>
>>
>>
>> --
>> Laurent "ker2x" Laborde
>> Sysadmin & DBA at http://www.over-blog.com/
>>
>
>
>
> --
> Laurent "ker2x" Laborde
> Sysadmin & DBA at http://www.over-blog.com/
>



-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] problem with neography and index.

Reply via email to