RE: solr alias not working on streaming query search

2017-07-06 Thread Lewin Joy (TMS)
Oh, Cool. Thank you, Joel.
I am using Solr 6.1 where I am still facing the issue.

Anyway, nice to know that this is fixed from versions 6.4 and ahead.

Thanks,
Lewin

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Wednesday, July 05, 2017 12:58 PM
To: solr-user@lucene.apache.org
Subject: Re: solr alias not working on streaming query search

This should be fixed in Solr 6.4:
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D9077&d=DwIFaQ&c=DDPRwrN9uYSNUDpKqPeD1g&r=WMeiuwk_Qf7aOundlWmtZMlairjO8ZQxQpAndx7JD6A&m=tGSZ2JduXxWkeXFuCKxt2jYMgYKiEntyhgBbjHW6XwI&s=IGQ1h4dqxL7S8d4WgzkSKNqavYGIHmfBIbdPQzww5I0&e=
 

Joel Bernstein
https://urldefense.proofpoint.com/v2/url?u=http-3A__joelsolr.blogspot.com_&d=DwIFaQ&c=DDPRwrN9uYSNUDpKqPeD1g&r=WMeiuwk_Qf7aOundlWmtZMlairjO8ZQxQpAndx7JD6A&m=tGSZ2JduXxWkeXFuCKxt2jYMgYKiEntyhgBbjHW6XwI&s=jhf2UbqYrF4pRtSVIJlKlMV7U1hPADnK38TAiBf6d5g&e=
 

On Wed, Jul 5, 2017 at 2:40 PM, Lewin Joy (TMS) 
wrote:

> ** PROTECTED 関係者外秘
>
> Have anyone faced a similar issue?
>
> I have a collection named “solr_test”. I created an alias to it as
> “solr_alias”.
> This alias works well when I do a simple search:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_solr-5Falias_select-3Findent-3Don-26q-3D-2A-3A-2A-26wt-3Djson&d=DwIFaQ&c=DDPRwrN9uYSNUDpKqPeD1g&r=WMeiuwk_Qf7aOundlWmtZMlairjO8ZQxQpAndx7JD6A&m=tGSZ2JduXxWkeXFuCKxt2jYMgYKiEntyhgBbjHW6XwI&s=OL8uIyg-XGxFfLtzOegcZPQcAYFKZy8bBh0UyyEi4jk&e=
>  
>
> But, this will not work when used in a streaming expression:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_solr-5Falias_stream-3Fexpr-3Dsearch-28solr-5Falias&d=DwIFaQ&c=DDPRwrN9uYSNUDpKqPeD1g&r=WMeiuwk_Qf7aOundlWmtZMlairjO8ZQxQpAndx7JD6A&m=tGSZ2JduXxWkeXFuCKxt2jYMgYKiEntyhgBbjHW6XwI&s=H0NTSF8Tqb1xMNQbVtk76Yac_FlfhPGekd9BYkXksyQ&e=
>  ,
> q=*:*, fl="p_PrimaryKey, p_name", qt="/select", sort="p_name asc")
>
> This gives me an error:
> "EXCEPTION": "java.lang.Exception: Collection not found:solr_alias"
>
> The same streaming query works when I use the actual collection name:
> “solr_test”
>
>
> Is this a limitation for aliases in solr? Or am I doing something wrong?
>
> Thanks,
> Lewin
>


solr alias not working on streaming query search

2017-07-05 Thread Lewin Joy (TMS)
** PROTECTED 関係者外秘

Have anyone faced a similar issue?

I have a collection named “solr_test”. I created an alias to it as “solr_alias”.
This alias works well when I do a simple search:
http://localhost:8983/solr/solr_alias/select?indent=on&q=*:*&wt=json

But, this will not work when used in a streaming expression:

http://localhost:8983/solr/solr_alias/stream?expr=search(solr_alias, q=*:*, 
fl="p_PrimaryKey, p_name", qt="/select", sort="p_name asc")

This gives me an error:
"EXCEPTION": "java.lang.Exception: Collection not found:solr_alias"

The same streaming query works when I use the actual collection name: 
“solr_test”


Is this a limitation for aliases in solr? Or am I doing something wrong?

Thanks,
Lewin


RE: Estimating CPU

2017-06-20 Thread Lewin Joy (TMS)
Hmm. Thanks Erick and Markus. 
I'll check this.

-Lewin

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, June 20, 2017 1:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Estimating CPU

To add on Erick,

First thing that comes to mind, you also have a huge heap, do you really need 
it to be that large, if not absolutely necessary, reduce it. If you need it 
because of FieldCache, consider DocValues instead and reduce the heap again.

Use tools like VisualVM to see what the CPU is doing, if it spends an 
unreasonable amount of time on garbage collection on small loads, your heap is 
probably too large.

Markus 
 
-Original message-
> From:Erick Erickson 
> Sent: Tuesday 20th June 2017 20:59
> To: solr-user 
> Subject: Re: Estimating CPU
> 
> In a word, "stress test". Here's the blog I wrote on topic outlining
> why it's hard to give a more helpful answer
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucidworks.com_2012_07_23_sizing-2Dhardware-2Din-2Dthe-2Dabstract-2Dwhy-2Dwe-2Ddont-2Dhave-2Da-2Ddefinitive-2Danswer_&d=DwIFaQ&c=DDPRwrN9uYSNUDpKqPeD1g&r=WMeiuwk_Qf7aOundlWmtZMlairjO8ZQxQpAndx7JD6A&m=8kEwxWMum9S4-v6BFeNiyod9aBKY7768SPgi1AROp1E&s=SGOI-gzQxOn7izwGycG2hkxQpZbh5fmSqgIEvOM7p-8&e=
>  
> 
> You might want to explore the hyper-log-log approach which provides
> pretty good estimates without so many resources.
> 
> Best,
> Erick
> 
> On Tue, Jun 20, 2017 at 11:36 AM, Lewin Joy (TMS)  
> wrote:
> > ** PROTECTED 関係者外秘
> > Hi,
> >
> > Is there anyway to estimate the CPU needed to setup solr environment?
> > We use pivot facets extensively. We use it in json facet api and also 
> > native queries.
> >
> > For our 150 million record collection, we are seeing high CPU usage of 100% 
> > with small loads.
> > If we have to increase our configuration, is there somehow we can estimate 
> > the CPU usage?
> >
> > We have five VMs with 8 CPU each and 32gb RAM, for which solr uses 24gb 
> > heap.
> >
> > Thanks,
> > Lewin
> 


Estimating CPU

2017-06-20 Thread Lewin Joy (TMS)
** PROTECTED 関係者外秘
Hi,

Is there anyway to estimate the CPU needed to setup solr environment?
We use pivot facets extensively. We use it in json facet api and also native 
queries.

For our 150 million record collection, we are seeing high CPU usage of 100% 
with small loads.
If we have to increase our configuration, is there somehow we can estimate the 
CPU usage?

We have five VMs with 8 CPU each and 32gb RAM, for which solr uses 24gb heap.

Thanks,
Lewin


RE: Frequent mismatch in the numDocs between replicas

2016-11-23 Thread Lewin Joy (TMS)
ll PROTECTED 関係者外秘

Hi,

Tried this. The explicit commit after Indexing is also not working.
As for the leader's document count, the number of records in the leader is also 
not proper. 
It is not just the replicas having wrong numbers.
Both the leader and replica are having wrong counts. And it is also mismatched 
between replicas.
Sometimes, Indexing does not reflect the data unless solr is restarted.

Usually after an optimize OR restart, we see that the counts in the leader and 
replicas match. 
And also the counts increase on both leaders and replicas. 
It is as if the inserted docs are not getting reflected anywhere. Not even on 
leaders.

It may have something to do with our code. Because curl Indexing / DIH Indexing 
returns proper count after Indexing.
Did something change in the way we are indexing in Solr 5.4 vs Solr 6, which 
could be causing the issue?

-Lewin

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 22, 2016 8:40 AM
To: solr-user 
Subject: Re: Frequent mismatch in the numDocs between replicas

The autocommit settings on leaders and replicas
can be slightly offset in terms of wall clock time so
docs that have been committed on one node may
not have been committed on the other. Your comment
that you can optimize and fix this is evidence that this
is what you're seeing.

to test this:
1> stop indexing
2> issue a "commit" to the collection.

If that shows all replicas with the same count, then
the above is the explanation.

Best,
Erick

On Mon, Nov 21, 2016 at 6:52 PM, Lewin Joy (TMS)  wrote:
> ** PROTECTED 関係者外秘
> Hi,
>
> I am having a strange issue working with solr 6.1 cloud setup on zookeeper 
> 3.4.8
>
> Intermittently after I run Indexing, the replicas are having a different 
> record count.
> And even though there is this mismatch, it is still marked healthy and is 
> being used for queries.
> So, now I get inconsistent results based on the replica used for the query.
>
> This gets resolved after restarting solr servers. Or if I just do an optimize 
> on the collection.
>
> Any idea what could be wrong? Have any of you faced something similar?
> Is there some configuration or setting I should be checking?
>
>
> Thanks,
> Lewin

Frequent mismatch in the numDocs between replicas

2016-11-21 Thread Lewin Joy (TMS)
** PROTECTED 関係者外秘
Hi,

I am having a strange issue working with solr 6.1 cloud setup on zookeeper 3.4.8

Intermittently after I run Indexing, the replicas are having a different record 
count.
And even though there is this mismatch, it is still marked healthy and is being 
used for queries.
So, now I get inconsistent results based on the replica used for the query.

This gets resolved after restarting solr servers. Or if I just do an optimize 
on the collection.

Any idea what could be wrong? Have any of you faced something similar?
Is there some configuration or setting I should be checking?


Thanks,
Lewin


Average of Averages in Solr

2016-10-05 Thread Lewin Joy (TMS)
•• PROTECTED 関係者外秘

Hi,

I have a big collection with around 100 million records.
There is a requirement to take an average on "Amount" field against each "code" 
field.
And then calculate the averages on this averages.
Since my "code" field has a very huge cardinality, which could be around 
200,000 or even in millions ; It gets highly complex to calculate the average 
of averages through Java.
Even Solr takes a huge time listing the averages. And the JSON response size 
becomes huge.
Is there some way we can tackle this? Any way we stats on stats?

Thanks,
Lewin


unique( )- How to override default of 100

2016-08-04 Thread Lewin Joy (TMS)
** PROTECTED 関係者外秘
Hi,

I was looking at Solr’s countdistinct feature with unique and hll functions.
I am interested in getting accurate cardinality in cloud setup.

As per the link, unique() function provides exact counts if the number of 
values per node does not exceed 100 by default.
How do I override this default to a much higher value?
Is it possible?
Refer: http://yonik.com/solr-count-distinct/


Thanks,
Lewin



RE: Hitting complex multilevel pivot queries in solr

2016-02-24 Thread Lewin Joy (TMS)
Hi Alvaro, 

We had thought about this. But our requirement is dynamic. 
The 4 fields to pivot on would change as per the many requirements.
So, this will need to be handled at query time.

Just considering the Endeca equivalent, it looks easy there.
If this feature is not available on Solr, would there be any effort in building 
this one?

P.S.
The endeca equivalent query below:
RETURN Results as SELECT Count(1) as "Total" GROUP BY "Country", "State", 
"part_num", "part_code" ORDER BY "Total" desc PAGE(0,100)

-Lewin

-Original Message-
From: Alvaro Cabrerizo [mailto:topor...@gmail.com] 
Sent: Friday, February 19, 2016 1:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Hitting complex multilevel pivot queries in solr

Hi,

The only way I can imagine is to create that auxiliar field for performing the 
facet on it. It means that you have to know "a priori" the kind of report 
(facet field) you need.

For example if you current data (solrdocument) is:

{
   "id": 3757,
   "country": "CountryX",
   "state": "StateY",
   "part_num: "part_numZ",
   "part_code": "part_codeW"
}

It should be changed at index time to:

{
   "id": 3757,
   "country": "CountryX",
   "state": "StateY",
   "part_num: "part_numZ",
   "part_code": "part_codeW",
   "auxField": "CountryX StateY part_numZ part_codeW"
}

And then perform the query faceting by auxField.


Regards.

On Fri, Feb 19, 2016 at 1:15 AM, Lewin Joy (TMS) 
wrote:

> Hi,
>
> The fields are single valued. But, the requirement will be at query 
> time rather than index time. This is because, we will be having many 
> such scenarios with different fields.
> I hoped we could concatenate at query time. I just need top 100 counts 
> from the leaf level of the pivot.
> I'm also looking at facet.threads which could give responses to an extent.
> But It does not solve my issue.
>
> Hovewer, the Endeca equivalent of this application seems to be working 
> well.
> Example Endeca Query:
>
> RETURN Results as SELECT Count(1) as "Total" GROUP BY "Country", 
> "State", "part_num", "part_code" ORDER BY "Total" desc PAGE(0,100)
>
>
> -Lewin
>
>
> -Original Message-
> From: Alvaro Cabrerizo [mailto:topor...@gmail.com]
> Sent: Thursday, February 18, 2016 3:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Hitting complex multilevel pivot queries in solr
>
> Hi,
>
> The idea of copying fields into a new one (or various) during indexing 
> and then facet the new field (or fields) looks promising. More 
> information about data will be helpful (for example if the 
> fields:country, state.. are single or multivalued). For example if all 
> of the fields are single valued, then the combination of 
> country,state,part_num,part_code looks like a file path 
> country/state/part_num/part_code and maybe (don't know your business 
> rules), the solr.PathHierarchyTokenizerFactory
> <https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters> could 
> be an option to research instead of facet pivoting. On the other hand, 
> I don't think that the copy field < 
> https://cwiki.apache.org/confluence/display/solr/Copying+Fields> 
> feature can help you to build that auxiliary field. I think that 
> configuring an updateRequestProcessorChain < 
> https://wiki.apache.org/solr/UpdateRequestProcessor>and building your 
> own UpdateRequestProcessorFactory to concat the 
> country,state,part_num,part_code values can be better way.
>
> Hope it helps.
>
> On Thu, Feb 18, 2016 at 8:47 PM, Lewin Joy (TMS) 
> 
> wrote:
>
> > Still splitting my head over this one.
> > Let me know if anyone has any idea I could try.
> >
> > Or, is there a way to concatenate these 4 fields onto a dynamic 
> > field and do a facet.field on top of this one?
> >
> > Thanks. Any idea is helpful to try.
> >
> > -Lewin
> >
> > -Original Message-
> > From: Lewin Joy (TMS) [mailto:lewin@toyota.com]
> > Sent: Wednesday, February 17, 2016 4:29 PM
> > To: solr-user@lucene.apache.org
> > Subject: Hitting complex multilevel pivot queries in solr
> >
> > Hi,
> >
> > Is there an efficient way to hit solr for complex time consuming queries?
> > I have a requirement where I need to pivot on 4 fields. Two fields 
> > contain facet values close to 50. And the other 2 fields have 5000 
> > and
> 8000 values.
> > Pivoting on the 4 fields would crash the server.
> >
> > Is there a better way to get the data?
> >
> > Example Query Params looks like this:
> > &facet.pivot=country,state,part_num,part_code
> >
> > Thanks,
> > Lewin
> >
> >
> >
> >
>


RE: Hitting complex multilevel pivot queries in solr

2016-02-18 Thread Lewin Joy (TMS)
Hi,

The fields are single valued. But, the requirement will be at query time rather 
than index time. This is because, we will be having many such scenarios with 
different fields.
I hoped we could concatenate at query time. I just need top 100 counts from the 
leaf level of the pivot.
I'm also looking at facet.threads which could give responses to an extent. But 
It does not solve my issue.

Hovewer, the Endeca equivalent of this application seems to be working well. 
Example Endeca Query: 

RETURN Results as SELECT Count(1) as "Total" GROUP BY "Country", "State", 
"part_num", "part_code" ORDER BY "Total" desc PAGE(0,100)


-Lewin


-Original Message-
From: Alvaro Cabrerizo [mailto:topor...@gmail.com] 
Sent: Thursday, February 18, 2016 3:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Hitting complex multilevel pivot queries in solr

Hi,

The idea of copying fields into a new one (or various) during indexing and then 
facet the new field (or fields) looks promising. More information about data 
will be helpful (for example if the fields:country, state.. are single or 
multivalued). For example if all of the fields are single valued, then the 
combination of country,state,part_num,part_code looks like a file path 
country/state/part_num/part_code and maybe (don't know your business rules), 
the solr.PathHierarchyTokenizerFactory
<https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters> could be an 
option to research instead of facet pivoting. On the other hand, I don't think 
that the copy field 
<https://cwiki.apache.org/confluence/display/solr/Copying+Fields> feature can 
help you to build that auxiliary field. I think that configuring an 
updateRequestProcessorChain 
<https://wiki.apache.org/solr/UpdateRequestProcessor>and building your own 
UpdateRequestProcessorFactory to concat the country,state,part_num,part_code 
values can be better way.

Hope it helps.

On Thu, Feb 18, 2016 at 8:47 PM, Lewin Joy (TMS) 
wrote:

> Still splitting my head over this one.
> Let me know if anyone has any idea I could try.
>
> Or, is there a way to concatenate these 4 fields onto a dynamic field 
> and do a facet.field on top of this one?
>
> Thanks. Any idea is helpful to try.
>
> -Lewin
>
> -Original Message-
> From: Lewin Joy (TMS) [mailto:lewin@toyota.com]
> Sent: Wednesday, February 17, 2016 4:29 PM
> To: solr-user@lucene.apache.org
> Subject: Hitting complex multilevel pivot queries in solr
>
> Hi,
>
> Is there an efficient way to hit solr for complex time consuming queries?
> I have a requirement where I need to pivot on 4 fields. Two fields 
> contain facet values close to 50. And the other 2 fields have 5000 and 8000 
> values.
> Pivoting on the 4 fields would crash the server.
>
> Is there a better way to get the data?
>
> Example Query Params looks like this:
> &facet.pivot=country,state,part_num,part_code
>
> Thanks,
> Lewin
>
>
>
>


RE: Hitting complex multilevel pivot queries in solr

2016-02-18 Thread Lewin Joy (TMS)
Still splitting my head over this one. 
Let me know if anyone has any idea I could try.

Or, is there a way to concatenate these 4 fields onto a dynamic field and do a 
facet.field on top of this one?

Thanks. Any idea is helpful to try.

-Lewin

-Original Message-
From: Lewin Joy (TMS) [mailto:lewin@toyota.com] 
Sent: Wednesday, February 17, 2016 4:29 PM
To: solr-user@lucene.apache.org
Subject: Hitting complex multilevel pivot queries in solr

Hi,

Is there an efficient way to hit solr for complex time consuming queries?
I have a requirement where I need to pivot on 4 fields. Two fields contain 
facet values close to 50. And the other 2 fields have 5000 and 8000 values. 
Pivoting on the 4 fields would crash the server.

Is there a better way to get the data?

Example Query Params looks like this:
&facet.pivot=country,state,part_num,part_code

Thanks,
Lewin





Hitting complex multilevel pivot queries in solr

2016-02-17 Thread Lewin Joy (TMS)
Hi,

Is there an efficient way to hit solr for complex time consuming queries?
I have a requirement where I need to pivot on 4 fields. Two fields contain 
facet values close to 50. And the other 2 fields have 5000 and 8000 values. 
Pivoting on the 4 fields would crash the server.

Is there a better way to get the data?

Example Query Params looks like this:
&facet.pivot=country,state,part_num,part_code

Thanks,
Lewin





RE: FieldCache

2016-01-14 Thread Lewin Joy (TMS)
Hi Toke,

Thanks for the reply. 
But, the grouping on multivalued is working for me even with multiple data in 
the multivalued field.
I also tested this on the tutorial collection from the later solr version 5.3.1 
, which works as well.
Maybe the wiki needs to be updated?

-Lewin

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Thursday, January 14, 2016 12:31 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldCache

On Thu, 2016-01-14 at 00:18 +, Lewin Joy (TMS) wrote:
> I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to 
> group results on a multivalued field, let's say "interests".
...
> But after I just re-indexed the data, it started working.

Grouping is not supposed to be supported for multi-valued fields:
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

I wonder if there might be an edge case where the field is marked as multiValue 
in schema.xml, but only contains single-values?

- Toke Eskildsen, State and University Library, Denmark




FieldCache

2016-01-13 Thread Lewin Joy (TMS)
Hi,

I have been facing a weird issue in solr.

I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to group 
results on a multivalued field, let's say "interests".
This is giving me an error message below:

  "error": {
"msg": "can not use FieldCache on multivalued field: interests",
"code": 400
  }

I thought this could be a version issue. 
But after I just re-indexed the data, it started working.

I wanted to understand this error message and why it could be failing sometimes 
on multivalued fields.

Thanks,
-Lewin



Error: FieldCache on multivalued field

2016-01-13 Thread Lewin Joy (TMS)
*updated subject line

Hi,

I have been facing a weird issue in solr.

I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to group 
results on a multivalued field, let's say "interests".
This is giving me an error message below:

  "error": {
"msg": "can not use FieldCache on multivalued field: interests",
"code": 400
  }

I thought this could be a version issue. 
But after I just re-indexed the data, it started working.

I wanted to understand this error message and why it could be failing sometimes 
on multivalued fields.

Thanks,
-Lewin



RE: Is Pivoted Grouping possible?

2015-12-21 Thread Lewin Joy (TMS)
If there is even a way to have a string concatenate function, we could bring 
out similar result sets. Is that possible?

-Lewin

-Original Message-
From: Lewin Joy (TMS) [mailto:lewin@toyota.com] 
Sent: Monday, December 21, 2015 12:16 PM
To: solr-user@lucene.apache.org
Subject: Is Pivoted Grouping possible?

Hi,

I am working with Solr 4.10.3 . And we are trying to retrieve some documents 
under for categories and sub-categories.
With grouping we are able to bring n number of records under each group.
Could we have a pivoted grouping where I could bring the results from 
sub-categories?

Example:


Apparel
Shirts
{id:1, Blue shirt}
{id:2, Green shirt}
Pants
{id:10, Blue Pants}
{id:20, Grey Pants} Sports
Basketball
{id:45, Black Basketball}
{id:32, Basketball hoop}


I know we could bring the number of records under each sub-category using 
facet.pivot=category,sub-cat .
Also grouping could give me records under each groups.
Is there a way we could combine this to give us pivoting groups? Or is there an 
alternative to bring about these results?

Thanks,
Lewin


Is Pivoted Grouping possible?

2015-12-21 Thread Lewin Joy (TMS)
Hi,

I am working with Solr 4.10.3 . And we are trying to retrieve some documents 
under for categories and sub-categories.
With grouping we are able to bring n number of records under each group.
Could we have a pivoted grouping where I could bring the results from 
sub-categories?

Example:


Apparel
Shirts
{id:1, Blue shirt}
{id:2, Green shirt}
Pants
{id:10, Blue Pants}
{id:20, Grey Pants}
Sports
Basketball
{id:45, Black Basketball}
{id:32, Basketball hoop}


I know we could bring the number of records under each sub-category using 
facet.pivot=category,sub-cat .
Also grouping could give me records under each groups.
Is there a way we could combine this to give us pivoting groups? Or is there an 
alternative to bring about these results?

Thanks,
Lewin


Instant Page Previews

2015-10-07 Thread Lewin Joy (TMS)
Hi,

Is there anyway we can implement instant page previews in solr?
Just saw that Google Search Appliance has this out of the box.
Just like what google.com had previously. We need to display the content of the 
result record when hovering over the link.

Thanks,
Lewin






RE: Morphline for Indexing Nested Document Structure

2015-09-11 Thread Lewin Joy (TMS)
Oh Yes. We are upgrading Cloudera to get solr 4.10 just to get this block join 
feature.
But, how do I index a nested document to use for block join for this huge a 
dataset?
I could not find anyway to sculpt the morphline file for this use case.

Thank you for the reply, Mikhail

-Lewin


-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Friday, September 11, 2015 2:13 PM
To: solr-user 
Subject: Re: Morphline for Indexing Nested Document Structure

Hello Lewin,

Block Join support is released in Solr 4.5.

On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS) 
wrote:

> Hi,
>
> I am having a huge data of about 600 Million documents.
> These documents are relational and I need to maintain the relation in solr.
>
> So, I am Indexing them as nested documents. It has nested documents 
> within nested documents.
> Now, my problem is how to index them.
>
> We are on Cloudera Solr 4.4 and using mapreduce Indexer.
> Can we specify this nested structure in the morphline file? For the 
> mapreduce or spark-submit, I need this handled through morphline.
>
> If this can't be done, is there an alternative that I can try?
>
> Thanks,
> Lewin
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>



Morphline for Indexing Nested Document Structure

2015-09-11 Thread Lewin Joy (TMS)
Hi,

I am having a huge data of about 600 Million documents.
These documents are relational and I need to maintain the relation in solr.

So, I am Indexing them as nested documents. It has nested documents within 
nested documents.
Now, my problem is how to index them.

We are on Cloudera Solr 4.4 and using mapreduce Indexer.
Can we specify this nested structure in the morphline file? For the mapreduce 
or spark-submit, I need this handled through morphline.

If this can't be done, is there an alternative that I can try?

Thanks,
Lewin


RE: SOLR to pivot on date range query

2015-08-17 Thread Lewin Joy (TMS)
Hi Yonik,

Thank you for the reply. I followed your link and this feature is really 
awesome to have.
But, unfortunately I am using solr 4.4 on cloudera right now. 
I tried this. Looks like it does not work for this version.
Sorry, I forgot to mention that in my original mail.

Thanks,
Lewin

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Monday, August 17, 2015 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR to pivot on date range query

The JSON Facet API can embed any type of facet within any other type:
http://yonik.com/json-facet-api/

json.facet={
  dates : {
type : range,
field : entryDate,
start : "2001-...",  // use full solr date format
end : "2015...",
gap : "+1MONTH",
facet : {
  type:terms,
  field:entryType
}
  }
}

-Yonik


On Mon, Aug 17, 2015 at 3:16 PM, Lewin Joy (TMS)  wrote:
> Hi,
>
> I have data that is coming in everyday. I need to query the index for a time 
> range and give the facet counts ordered by different months.
> For this, I just have a solr date field, entryDate which captures the time.
>
> How do I make this query? I need the results like below.
>
> Jan-2015 (2000)
> entryType=Sales(750)
> entryType=Complaints(200)
> entryType=Feedback(450)
> Feb-2015(3200)
> entryType=Sales(1000)
> entryType=Complaints(250)
> entryType=Feedback(600)
> Mar-2015(2800)
> entryType=Sales(980)
> entryType=Complaints(220)
> entryType=Feedback(400)
>
>
> I tried Range queries on 'entryDate' field to order the result facets by 
> month.
> But, I am not able to pivot on the 'entryType' field to bring the counts of 
> "sales,complaints and feedback" type record by month.
>
> For now, I am creating another field at index time to have the value for 
> "MONTH-YEAR" derived from the 'entryDate' field.
> But for older records, it becomes a hassle. Is there a way I can handle this 
> at query time?
> Or is there a better way to handle this situation?
>
> Please let me know. Any thoughts / suggestions are valuable.
>
> Thanks,
> Lewin
>


SOLR to pivot on date range query

2015-08-17 Thread Lewin Joy (TMS)
Hi,

I have data that is coming in everyday. I need to query the index for a time 
range and give the facet counts ordered by different months.
For this, I just have a solr date field, entryDate which captures the time.

How do I make this query? I need the results like below.

Jan-2015 (2000)
entryType=Sales(750)
entryType=Complaints(200)
entryType=Feedback(450)
Feb-2015(3200)
entryType=Sales(1000)
entryType=Complaints(250)
entryType=Feedback(600)
Mar-2015(2800)
entryType=Sales(980)
entryType=Complaints(220)
entryType=Feedback(400)

 
I tried Range queries on 'entryDate' field to order the result facets by month. 
But, I am not able to pivot on the 'entryType' field to bring the counts of 
"sales,complaints and feedback" type record by month.

For now, I am creating another field at index time to have the value for 
"MONTH-YEAR" derived from the 'entryDate' field.
But for older records, it becomes a hassle. Is there a way I can handle this at 
query time? 
Or is there a better way to handle this situation?

Please let me know. Any thoughts / suggestions are valuable.

Thanks,
Lewin