qt="/export" immediately fixed the query in Question #1. Sorry for missing that in the docs!
The second query (with /export) crashes the server so I was going to look at parallelization if you think that's a good idea. It also seems unwise to joining into 26M docs so maybe I can reconfigure the query to run along a more happy path :-) The schema is very RDBMS-centric so maybe that just won't ever work in this framework. Here's the log but it's not very helpful. INFO - 2016-05-13 23:18:13.214; [c:triple s:shard1 r:core_node1 x:triple_shard1_replica1] org.apache.solr.core.SolrCore; [triple_shard1_replica1] webapp=/solr path=/export params={q=*:*&distrib=false&fl=triple_id,subject_id,type_id&sort=type_id+asc&wt=json&version=2.2} hits=26305619 status=0 QTime=61 INFO - 2016-05-13 23:18:13.747; [c:triple_type s:shard1 r:core_node1 x:triple_type_shard1_replica1] org.apache.solr.core.SolrCore; [triple_type_shard1_replica1] webapp=/solr path=/export params={q=*:*&distrib=false&fl=triple_type_id,triple_type_label&sort=triple_type_id+asc&wt=json&version=2.2} hits=702 status=0 QTime=2 INFO - 2016-05-13 23:18:48.504; [ ] org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@6ad0f304 name:ZooKeeperConnection Watcher:localhost:9983 got event WatchedEvent state:Disconnected type:None path:null path:null type:None INFO - 2016-05-13 23:18:48.504; [ ] org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected ERROR - 2016-05-13 23:18:51.316; [c:triple s:shard1 r:core_node1 x:triple_shard1_replica1] org.apache.solr.common.SolrException; null:Early Client Disconnect WARN - 2016-05-13 23:18:51.431; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x154ac66c81e0002 for server localhost/0:0:0:0:0:0:0:1:9983, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) On Fri, May 13, 2016 at 3:09 PM, Joel Bernstein <joels...@gmail.com> wrote: > A couple of other things: > > 1) Your innerJoin can parallelized across workers to improve performance. > Take a look at the docs on the parallel function for the details. > > 2) It looks like you might be doing graph operations with joins. You might > to take a look at the gatherNodes function coming in 6.1: > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62693238 > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, May 13, 2016 at 5:57 PM, Joel Bernstein <joels...@gmail.com> > wrote: > > > When doing things that require all the results (like joins) you need to > > specify the /export handler in the search function. > > > > qt="/export" > > > > The search function defaults to the /select handler which is designed to > > return the top N results. The /export handler always returns all results > > that match the query. Also keep in mind that the /export handler requires > > that sort fields and fl fields have docValues set. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Fri, May 13, 2016 at 5:36 PM, Ryan Cutter <ryancut...@gmail.com> > wrote: > > > >> Question #1: > >> > >> triple_type collection has a few hundred docs and triple has 25M docs. > >> > >> When I search for a particular subject_id in triple which I know has 14 > >> results and do not pass in 'rows' params, it returns 0 results: > >> > >> innerJoin( > >> search(triple, q=subject_id:1656521, > >> fl="triple_id,subject_id,type_id", > >> sort="type_id asc"), > >> search(triple_type, q=*:*, fl="triple_type_id,triple_type_label", > >> sort="triple_type_id asc"), > >> on="type_id=triple_type_id" > >> ) > >> > >> When I do the same search with rows=10000, it returns 14 results: > >> > >> innerJoin( > >> search(triple, q=subject_id:1656521, > >> fl="triple_id,subject_id,type_id", > >> sort="type_id asc", rows=10000), > >> search(triple_type, q=*:*, fl="triple_type_id,triple_type_label", > >> sort="triple_type_id asc", rows=10000), > >> on="type_id=triple_type_id" > >> ) > >> > >> Am I doing this right? Is there a magic number to pass into rows which > >> says "give me all the results which match this query"? > >> > >> > >> Question #2: > >> > >> Perhaps related to the first question but I want to run the innerJoin() > >> without the subject_id - rather have it use the results of another > query. > >> But this does not return any results. I'm saying "search for this > entity > >> based on id then use that result's entity_id as the subject_id to look > >> through the triple/triple_type collections: > >> > >> hashJoin( > >> innerJoin( > >> search(triple, q=*:*, fl="triple_id,subject_id,type_id", > >> sort="type_id asc"), > >> search(triple_type, q=*:*, > fl="triple_type_id,triple_type_label", > >> sort="triple_type_id asc"), > >> on="type_id=triple_type_id" > >> ), > >> hashed=search(entity, > >> q=id:"urn:sid:entity:455dfa1aa27eedad21ac2115797c1580bb3b3b4e", > >> fl="entity_id,entity_label", sort="entity_id asc"), > >> on="subject_id=entity_id" > >> ) > >> > >> Am I using doing this hashJoin right? > >> > >> Thanks very much, Ryan > >> > > > > >