Roman,

    The video was very clarifying and I realized block joins would be a
great fit for my problem. However, I got worried about the size of the
block... I could have 10 million childs for 1 parent, for instance. Althout
this could stay in the same shard, do you guys think it would be a huge
problem at _query time_? Of course the indexing would take longer, but if
it can query faster, it would be a great fit for my case...

Best regards,
Marcelo.


2013/7/8 Roman Chyla <roman.ch...@gmail.com>

> Hello,
>
> The joins are not the only idea, you may want to write your own function
> (ValueSource) that can implement your logic. However, I think you should
> not throw away the regex idea (as being slow), before trying it out -
> because it can be faster than the joins. Your problem is that the number of
> entities need to be limited, see recent replies of Jack Krupansky on the
> number of fields.
>
> The joins are of different kinds, I recommend this link to see their
> differences: http://vimeo.com/44299232
>
> If your data relations can fit in memory, a smart cache (ie [un]inverted
> index) will always outperform lucene joins - look at the chart inside this:
> http://code4lib.org/files/2ndOrderOperatorsv2.pdf
>
> roman
>
>
> On Mon, Jul 8, 2013 at 4:03 PM, Marcelo Elias Del Valle
> <mvall...@gmail.com>wrote:
>
> > Hello all,
> >
> >     I am using Solr Cloud today and I have the following need:
> >
> >    - My queries focus on counting how many users attend to some criteria.
> >    So my main document is "user" (parent table)
> >    - Each user can access several web pages (a child table) and each web
> >    page might have several attributes.
> >    - I need to lookup for users where there is some page accessed by them
> >    which matches a set of attributes. For example, I have two scenarios:
> >       1. if a user accessed a web page WP1 with a URL that starts with
> >       "www." and with a title that includes "solr", then the user is a
> > match.
> >       2. However, if there is a webpage WP1 with such url and ANOTHER WP2
> >       that includes "solr" in the title, this is not a match.
> >
> >
> >     If I were modeling this on a relational DB, user would be a table and
> > url would be other. However, as I using solr, my first option would be
> > denormalizing first. Simply storing all the fields in the user document
> > wouldn't work, as I would work as described in scenario 2.
> >      I thought in two solutions for these:
> >
> >    - Using the idea of an inverted index - Having several kinds of
> >    documents (user, web page, entity 3, entity 4, etc.) where each entity
> > (web
> >    page, for instance) would have a field to relate to the user id. Then,
> >    using a cross join in solr to get the results where there was a match
> on
> >    user (parent table) and also on each child entity (in other words, to
> > merge
> >    the results of several queries that might return user ids). This has a
> >    drawback of using a join.
> >    - Having just a user document and storing each web page as only one
> >    field (like a json). To search, the same field would need to match a
> >    regular expression that includes both conditions. This would make my
> > search
> >    slower and I would not be able to apply the same technique if the
> child
> >    tables also had children.
> >
> >     Am I missing any obvious solution here? I would love to receive
> critics
> > on this, as I am probably not the only one who have this problem...  I
> > would like more ideas on how to denormalize data in this case.  Is the
> join
> > my best option here?
> >
> > Best regards,
> > --
> > Marcelo Elias Del Valle
> > http://mvalle.com - @mvallebr
> >
>



-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Reply via email to