Re: Difference between hashJoin and innerJoin in Streaming Expression

2017-03-25 Thread Zheng Lin Edwin Yeo
Hi Joel,

Thanks for the information.

Regards,
Edwin


On 25 March 2017 at 10:15, Joel Bernstein  wrote:

> The innerJoin is a merge join and the hashJoin is a hash join.
>
> The merge join can support joins of unlimited size and never runs out of
> memory. But it requires that both sides of the join are sorted on the join
> keys.
>
> The hash join reads one side of the join into a hash map keyed on the join
> keys. This doesn't require any specific sort but it is limited in size by
> how much data can fit in the hash map.
>
> You can parallelize both joins using the parallel function to improve
> scalability and performance.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Mar 24, 2017 at 4:49 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > What is the main difference between hashJoin and innerJoin in Solr
> > Streaming Expression?
> >
> > I understand that both will emit a tuple containing the fields of both
> > tuples.
> >
> > When I tried both hashJoin and innerJoin with the same query, I get
> exactly
> > the same results, and there is no difference in performance.
> >
> > Under what circumstances should we use hashJoin, and under what
> > circumstances should we use innerJoin?
> >
> > Regards,
> > Edwin
> >
>


Re: Difference between hashJoin and innerJoin in Streaming Expression

2017-03-24 Thread Joel Bernstein
The innerJoin is a merge join and the hashJoin is a hash join.

The merge join can support joins of unlimited size and never runs out of
memory. But it requires that both sides of the join are sorted on the join
keys.

The hash join reads one side of the join into a hash map keyed on the join
keys. This doesn't require any specific sort but it is limited in size by
how much data can fit in the hash map.

You can parallelize both joins using the parallel function to improve
scalability and performance.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Mar 24, 2017 at 4:49 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> What is the main difference between hashJoin and innerJoin in Solr
> Streaming Expression?
>
> I understand that both will emit a tuple containing the fields of both
> tuples.
>
> When I tried both hashJoin and innerJoin with the same query, I get exactly
> the same results, and there is no difference in performance.
>
> Under what circumstances should we use hashJoin, and under what
> circumstances should we use innerJoin?
>
> Regards,
> Edwin
>


Difference between hashJoin and innerJoin in Streaming Expression

2017-03-24 Thread Zheng Lin Edwin Yeo
Hi,

What is the main difference between hashJoin and innerJoin in Solr
Streaming Expression?

I understand that both will emit a tuple containing the fields of both
tuples.

When I tried both hashJoin and innerJoin with the same query, I get exactly
the same results, and there is no difference in performance.

Under what circumstances should we use hashJoin, and under what
circumstances should we use innerJoin?

Regards,
Edwin