Hi, I posted this question originally on stack overflow and it was suggested I use this mailing list instead so I'm sending it out here also. Here's my original link if you want to maybe answer there also. But I also copied the question into the body of the email.
https://stackoverflow.com/questions/55130208/using-solr-graph-to-traverse-n-relationships I'm investigating if I can use an existing solr store to do graph traversal. It would be ideal to not have to duplicate the data in a graph store. I was playing with the solr streaming capabilities and the nodes (gatherNodes) source. I have three problems with it and I'm wondering if people have found solutions: 1) getting the original documents that the nodes references with all of their fields. I did eventually solve this by doing an innerJoin on the nodes returned by gatherNodes and a query against "*:*" but this seem less than ideal. Is there a better way to do this? Even better would be if I could do it as an "export" and not a "select" to better handle large amounts of data. This problem is small compared to the other two which seem like major bugs in Solr 2) I can't traverse to nodes from a field that has more than one value. In the nodes stream source definition there is a walk parameter. nodes(collection, search(some search params) walk="ref->id", gather="vals") in this example its walking the from the search results, taking the field "ref" on those docs and finding all nodes that match that as an id. This works until ref becomes a list of values. Has anyone had success making this work? A simple example would be a tree structure where you have a folder document and it has a multiValue field representing its subfolders and files. How would I walk that relationship? 3) in that example the gather is returning the nodes that are represented by the "vals" field on all the nodes that result from the walk. This also does not work if that field is multiValued. Has anyone had any success with this also? Again going back to the files and folders example, I want to return all the files in the subfolders of the selected folder. nodes(collection, search(collection, q="path:currentFolder", qt="/select", sort="fileId ASC"), walk="contents->fileId", gather="contents", fq="type:file") I made this up so there may be some typos but the premise is that contents are a multiValued string field and every document, either of type "file" or "folder" has a fileId, which is what the contents field references. How would I accomplish this? Do these fields need to be indexed in a special way? Something that interesting is I see in the solr documentation it does support a multi valued walk but only if its a hard coded value nodes(emails, walk="john...@apache.org, janesm...@apache.org->from", gather="to") but when using a different stream as the input of the nodes function it can't resolve fields that are multivalues. It can't even properly resolve text fields that mimic the example above. If I store a field called refs with a string value of "ref-1, ref-2, ref-3", the only match will be on an id of "ref-1" when walk="refs->id" Thanks, I'd appreciate any help