Using solr graph to traverse N relationships

Nightingale, Jonathan A (US) Wed, 13 Mar 2019 08:20:42 -0700

Hi,
I posted this question originally on stack overflow and it was suggested I use 
this mailing list instead so I'm sending it out here also. Here's my original 
link if you want to maybe answer there also. But I also copied the question 
into the body of the email.


https://stackoverflow.com/questions/55130208/using-solr-graph-to-traverse-n-relationships

I'm investigating if I can use an existing solr store to do graph traversal. It 
would be ideal to not have to duplicate the data in a graph store. I was 
playing with the solr streaming capabilities and the nodes (gatherNodes) 
source. I have three problems with it and I'm wondering if people have found 
solutions:
1) getting the original documents that the nodes references with all of their 
fields. I did eventually solve this by doing an innerJoin on the nodes returned 
by gatherNodes and a query against "*:*" but this seem less than ideal. Is 
there a better way to do this? Even better would be if I could do it as an 
"export" and not a "select" to better handle large amounts of data. This 
problem is small compared to the other two which seem like major bugs in Solr
2) I can't traverse to nodes from a field that has more than one value. In the 
nodes stream source definition there is a walk parameter.
nodes(collection,
search(some search params)
walk="ref->id",
gather="vals")

in this example its walking the from the search results, taking the field "ref" 
on those docs and finding all nodes that match that as an id. This works until 
ref becomes a list of values. Has anyone had success making this work? A simple 
example would be a tree structure where you have a folder document and it has a 
multiValue field representing its subfolders and files. How would I walk that 
relationship?
3) in that example the gather is returning the nodes that are represented by 
the "vals" field on all the nodes that result from the walk. This also does not 
work if that field is multiValued. Has anyone had any success with this also? 
Again going back to the files and folders example, I want to return all the 
files in the subfolders of the selected folder.
nodes(collection,
search(collection, q="path:currentFolder", qt="/select", sort="fileId ASC"),
walk="contents->fileId",
gather="contents",
fq="type:file")

I made this up so there may be some typos but the premise is that contents are 
a multiValued string field and every document, either of type "file" or 
"folder" has a fileId, which is what the contents field references. How would I 
accomplish this? Do these fields need to be indexed in a special way?
Something that interesting is I see in the solr documentation it does support a 
multi valued walk but only if its a hard coded value

nodes(emails, walk="john...@apache.org, janesm...@apache.org->from", 
gather="to")

but when using a different stream as the input of the nodes function it can't 
resolve fields that are multivalues. It can't even properly resolve text fields 
that mimic the example above. If I store a field called refs with a string 
value of "ref-1, ref-2, ref-3", the only match will be on an id of "ref-1" when 
walk="refs->id"

Thanks, I'd appreciate any help

Using solr graph to traverse N relationships

Reply via email to