Re: Include JSON facet inside Solr Streaming

2017-07-01 Thread Susheel Kumar
Yes. In general, any expression can be nested inside other expressions or
stream sources.

On Sat, Jul 1, 2017 at 1:43 AM, Zheng Lin Edwin Yeo 
wrote:

> Is it possible to do a Join (Eg: hashJoin, innerJoin) on the facet stream
> expression?
>
> Regards,
> Edwin
>
> On 1 July 2017 at 03:30, Susheel Kumar  wrote:
>
> > I doubt it can work.  Why not utilise facet stream expression which use
> > JSON facet under the cover.
> >
> > On Thu, Jun 29, 2017 at 9:44 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Is it currently possible to include JSON facet inside Solr Streaming?
> > >
> > > I am trying out with the following query, which combines JSON facet
> > > together with the hashJoin from Streaming, but we get the error saying
> > > that is
> > > not a proper expression clause.
> > >
> > > If it is possible, what should be the correct way to include it?
> > >
> > > I'm using Solr 6.5.1.
> > >
> > > http://localhost:8983/edm/collection1/stream?expr=hashJoin(
> > >   search(collection1,
> > > q="id:",
> > > fq="{!child of="contentType_s:collection1Header"}field1a:*&
> json.facet={
> > > TotalAmount1:"sum(totalAmount1)"}",
> > > fl="field1a,field1b,field1c,field1d",
> > > sort="field1a asc",
> > > qt="/export"),
> > >   hashed=search(collection2,
> > > q"=id:",
> > > fq="json.facet={
> > > TotalAmount2:"sum(totalAmount2)"}",
> > > fl="field2a,field2b,field2c,field2d",
> > > sort="field2a asc",
> > > qt="/export"),
> > >   on="field1a=field1b"
> > > )=true
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
>


Re: Allow Join over two sharded collection

2017-07-01 Thread Glick, David
Unsubscribe 

Sent from my iPhone

> On Jul 1, 2017, at 8:02 PM, Susheel Kumar  wrote:
> 
> Depending on your use case people also use collection aliasing for time
> series data.  See below
> 
> https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/
> 
>> On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar  wrote:
>> 
>> As Eric said 1docs/month isn't a big deal.  I have 45+ million docs in one
>> shard but YMMV depending on other factors.
>> 
>> Also there is lot of confusion in the terminology. The default routing is
>> compositeID routing.  The implicit routing which Eric mentioned is the
>> manual routing.  https://issues.apache.org/jira/browse/SOLR-6630
>> 
>> Which routing you are suggesting to use? Can you clarify again.  Also
>> what's your exact use case.  Do you query old aged documents or you don't
>> need to and most or all of your queries are supposed to go to shard with
>> newer documents.
>> 
>> Thanks,
>> Susheel
>> 
>> On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson 
>> wrote:
>> 
>>> 1M docs/month shouldn't make Solr break a sweat. If it really worries
>>> you and you're indexing in a big batch, index during off hours. At
>>> very worst, if you're ingesting them all at once you might have to
>>> throttle the indexing a bit.
>>> 
>>> Frankly, most of the time acquiring the documents from the system of
>>> record is where the bottleneck is and Solr easily handles the indexing
>>> load.
>>> 
>>> The other advantage is that if you use implicit routing rather than a
>>> composite ID, you can add shards to your collection one at a time as
>>> required, for time-series data that's an elegant way to "age out" old
>>> documents.
>>> 
>>> Best,
>>> Erick
>>> 
 On Sat, Jul 1, 2017 at 8:57 AM, mganeshs  wrote:
 Hi Susheel,
 
 Currently we have around 20M documents already and we are expecting now
>>> on
 that every month 1M of documents.
 The reason why don't want to for time based implicit routing is that,
>>> all
 documents will end up with recent shard and so indexing will be heavy
>>> for
 the new shard, where as older shards will be used just for query
>>> purpose.
 If we have default sharding, then load for indexing is distributed
>>> across
 all the shards. That's the reason we would like to stick to default
 sharding. But Join is the issue over here when default sharding is used
>>> :-(
 
 
 
 --
 View this message in context: http://lucene.472066.n3.nabble
>>> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
 Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 


Re: Allow Join over two sharded collection

2017-07-01 Thread Susheel Kumar
Depending on your use case people also use collection aliasing for time
series data.  See below

https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/

On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar  wrote:

> As Eric said 1docs/month isn't a big deal.  I have 45+ million docs in one
> shard but YMMV depending on other factors.
>
> Also there is lot of confusion in the terminology. The default routing is
> compositeID routing.  The implicit routing which Eric mentioned is the
> manual routing.  https://issues.apache.org/jira/browse/SOLR-6630
>
> Which routing you are suggesting to use? Can you clarify again.  Also
> what's your exact use case.  Do you query old aged documents or you don't
> need to and most or all of your queries are supposed to go to shard with
> newer documents.
>
> Thanks,
> Susheel
>
> On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson 
> wrote:
>
>> 1M docs/month shouldn't make Solr break a sweat. If it really worries
>> you and you're indexing in a big batch, index during off hours. At
>> very worst, if you're ingesting them all at once you might have to
>> throttle the indexing a bit.
>>
>> Frankly, most of the time acquiring the documents from the system of
>> record is where the bottleneck is and Solr easily handles the indexing
>> load.
>>
>> The other advantage is that if you use implicit routing rather than a
>> composite ID, you can add shards to your collection one at a time as
>> required, for time-series data that's an elegant way to "age out" old
>> documents.
>>
>> Best,
>> Erick
>>
>> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs  wrote:
>> > Hi Susheel,
>> >
>> > Currently we have around 20M documents already and we are expecting now
>> on
>> > that every month 1M of documents.
>> > The reason why don't want to for time based implicit routing is that,
>> all
>> > documents will end up with recent shard and so indexing will be heavy
>> for
>> > the new shard, where as older shards will be used just for query
>> purpose.
>> > If we have default sharding, then load for indexing is distributed
>> across
>> > all the shards. That's the reason we would like to stick to default
>> > sharding. But Join is the issue over here when default sharding is used
>> :-(
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.nabble
>> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Allow Join over two sharded collection

2017-07-01 Thread Susheel Kumar
As Eric said 1docs/month isn't a big deal.  I have 45+ million docs in one
shard but YMMV depending on other factors.

Also there is lot of confusion in the terminology. The default routing is
compositeID routing.  The implicit routing which Eric mentioned is the
manual routing.  https://issues.apache.org/jira/browse/SOLR-6630

Which routing you are suggesting to use? Can you clarify again.  Also
what's your exact use case.  Do you query old aged documents or you don't
need to and most or all of your queries are supposed to go to shard with
newer documents.

Thanks,
Susheel

On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson 
wrote:

> 1M docs/month shouldn't make Solr break a sweat. If it really worries
> you and you're indexing in a big batch, index during off hours. At
> very worst, if you're ingesting them all at once you might have to
> throttle the indexing a bit.
>
> Frankly, most of the time acquiring the documents from the system of
> record is where the bottleneck is and Solr easily handles the indexing
> load.
>
> The other advantage is that if you use implicit routing rather than a
> composite ID, you can add shards to your collection one at a time as
> required, for time-series data that's an elegant way to "age out" old
> documents.
>
> Best,
> Erick
>
> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs  wrote:
> > Hi Susheel,
> >
> > Currently we have around 20M documents already and we are expecting now
> on
> > that every month 1M of documents.
> > The reason why don't want to for time based implicit routing is that, all
> > documents will end up with recent shard and so indexing will be heavy for
> > the new shard, where as older shards will be used just for query purpose.
> > If we have default sharding, then load for indexing is distributed across
> > all the shards. That's the reason we would like to stick to default
> > sharding. But Join is the issue over here when default sharding is used
> :-(
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Allow Join over two sharded collection

2017-07-01 Thread Erick Erickson
1M docs/month shouldn't make Solr break a sweat. If it really worries
you and you're indexing in a big batch, index during off hours. At
very worst, if you're ingesting them all at once you might have to
throttle the indexing a bit.

Frankly, most of the time acquiring the documents from the system of
record is where the bottleneck is and Solr easily handles the indexing
load.

The other advantage is that if you use implicit routing rather than a
composite ID, you can add shards to your collection one at a time as
required, for time-series data that's an elegant way to "age out" old
documents.

Best,
Erick

On Sat, Jul 1, 2017 at 8:57 AM, mganeshs  wrote:
> Hi Susheel,
>
> Currently we have around 20M documents already and we are expecting now on
> that every month 1M of documents.
> The reason why don't want to for time based implicit routing is that, all
> documents will end up with recent shard and so indexing will be heavy for
> the new shard, where as older shards will be used just for query purpose.
> If we have default sharding, then load for indexing is distributed across
> all the shards. That's the reason we would like to stick to default
> sharding. But Join is the issue over here when default sharding is used :-(
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Allow Join over two sharded collection

2017-07-01 Thread mganeshs
Hi Susheel,

Currently we have around 20M documents already and we are expecting now on
that every month 1M of documents. 
The reason why don't want to for time based implicit routing is that, all
documents will end up with recent shard and so indexing will be heavy for
the new shard, where as older shards will be used just for query purpose. 
If we have default sharding, then load for indexing is distributed across
all the shards. That's the reason we would like to stick to default
sharding. But Join is the issue over here when default sharding is used :-(



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr suggester query with quotes produces different results

2017-07-01 Thread Angel Todorov
Hi guys,

I have the Suggester configured using the FreeTextFactory. Noticed that if
I dont use quotation marks, I only get single term results. If i use
quotation marks around my query, then I only get results that are comprised
of multiple terms. There is no configuration that would return both types
of results with a single query.

Thanks
Angel