Hi, After a quick glance at Drill-3929 I think I should state that this is "only" about the push down of the join filter and a efficient way to do join that does not require a full scan.
We are not using Lucene as an external index for a separate data source ea. the Lucene index contains all the information we need for the join (stored fields). I guess this would make more sense to people if we said we were using Solr or Elastic Search but this use-case is not as complex as the one detailed in Drill-3929. Regards, -Stefan On Sat, Jan 16, 2016 at 8:11 PM, Stefán Baxter <[email protected]> wrote: > Hi Jacques, > > Thank you for taking the time, it's appreciated. > > I'm trying to contribute to the Lucene reader for Drill (Started by Rahul > Challapalli). We would like to use it for storage of metadata used in our > Drill setup. > This is perfectly suited for our needs as the metadata is already > available in Lucene document+indexes and it's tenant specific (So this is > not the global metadata that should reside in Postgres/HBase or something > similar) > > I think it's best that I confess that I'm not sure what I'm looking for or > how to ask for it, at least not in proper Drill terms. > > The Lucene reader is working but the joins currently rely on full scan > which introduces ~20 time longer execution time on simple data sets (few > million records) so I need to get the index based joins going but I don't > know how. > > We have resources to do this now but our knowlidge of Drill is limited and > I could not, in my initial scan of the project, find any use > of DrillJoinRel that indicated indexes were involved (please forgive me if > this is a false assumption). > > Can you please clarify things for me a bit: > > - Is the JDBC connector already doing proper pushdown of filters for > joins? (If so then I must really get my reading glasses on) > - What will change with this new approach. > > I'm not really sure what you need from me now but I'm more than happy to > share everything except the data it self :). > > The fork is places here: > https://github.com/activitystream/drill/tree/lucene-work but no tests > files are included in the repo, sorry, and this is all very immature. > > Regards, > -Stefán > > > > > On Sat, Jan 16, 2016 at 7:46 PM, Jacques Nadeau <[email protected]> > wrote: > >> Closest things already done to date is the join pushdown in the jdbc >> connector and the prototype code someone built a while back to do a join >> using HBase as a hash table. Aman and I have an ongoing thread discussing >> using elastic indexing and sideband communication to accelerate joins. If >> would be great if you could cover exactly what you're doing (including >> relevant stats), that would give us a better idea of how to point you in >> the right direction. >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >> On Sat, Jan 16, 2016 at 5:18 AM, Stefán Baxter <[email protected] >> > >> wrote: >> >> > Hi, >> > >> > Can anyone point me to an implementation where joins are implemented >> with >> > full support for filters and efficient handling of joins based on >> indexes. >> > >> > The only code I have come across all seems to rely on complete scan of >> the >> > related table and that is not acceptable for the use case we are >> working on >> > (Lucene reader). >> > >> > Regards, >> > -Stefán >> > >> > >
