Re: Solr for real time analytics system

2016-02-04 Thread Rohit Kumar
Thanks Bhimavarapu for the information.

We are creating our own dashboard, so probably wont need kibana/banana. I
was more curious about Solr support for fast aggregation query over very
large data set. As suggested, I guess elasticsearch  has this capability.
Is there any published metrics or data regarding elasticsearch/solr
performance in this area that I can refer to?

Thanks
Rohit



On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu <chaitu...@gmail.com>
wrote:

> Hello Rohit,
>
> You can use the Banana project which was forked from Kibana
> <https://github.com/elastic/kibana>, and works with all kinds of time
> series (and non-time series) data stored in Apache Solr
> <https://lucene.apache.org/solr/>. It uses Kibana's powerful dashboard
> configuration capabilities, ports key panels to work with Solr, and
> provides significant additional capabilities, including new panels that
> leverage D3.js <http://d3js.org/>
>
>  would need mostly aggregation queries like sum/average/groupby etc, but
> > data set is quite huge. The aggregation queries should be very fast.
>
>
> all your requirement can be served by this banana but I'm not sure about
> how fast solr compare to ELK <https://www.elastic.co/products>
>
> On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> rohitkumarbhagat...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am quite new to Solr. I have to build a real time analytics system
> which
> > displays metrics based on multiple filters over a huge data set
> (~50million
> > documents with ~100 fileds ).  I would need mostly aggregation queries
> like
> > sum/average/groupby etc, but data set is quite huge. The aggregation
> > queries should be very fast.
> >
> > Is Solr suitable for such use cases?
> >
> > Thanks
> > Rohit
> >
>
>
>
> --
> ckreddybh. <chaitu...@gmail.com>
>


Solr for real time analytics system

2016-02-03 Thread Rohit Kumar
Hi

I am quite new to Solr. I have to build a real time analytics system which
displays metrics based on multiple filters over a huge data set (~50million
documents with ~100 fileds ).  I would need mostly aggregation queries like
sum/average/groupby etc, but data set is quite huge. The aggregation
queries should be very fast.

Is Solr suitable for such use cases?

Thanks
Rohit


Section Search in SOLR

2013-09-28 Thread Rohit Kumar
Hi,

I have following SOLR documents indexed.

doc
str name=id1/str
arr name=companyName
  strBoeing/str
  strKaseya/str
/arr
arr name=positionName
  strExecutive/str
  strTechnician/str
/arr
doc

doc
str name=id2/str
arr name=companyName
  strBoeing/str
  strKodak/str
/arr
arr name=positionName
  strTechnician/str
  strExecutive/str
/arr
doc


Company name and Position name are multivalued fields maintained in order.

The following is the solr query.
*fq=companyName:Boeingfq=positionName:Executive* which returns both the
documents as expected.
What changes will i have to make to be able to search for
companyName:Boeing and positionName:Executive both at same indexes in the
corresponding multivalued fields i.e. should return me only doc id 1.


Thanks,
Rohit Kumar


Re: Section Search in SOLR

2013-09-28 Thread Rohit Kumar
Thanks Jack for quick reply.


Probably my question was not elaborate enough. Let me add more explanation.

*Option 1:
*
Even if I flatten my document to store separate *experiences* in
multivalued field, solr will still return me the doc id 1 and 2 if i query
: *fq=**experience:Boeingfq=**experience:Executive*

doc
str name=id1/str
arr name=companyName
  strBoeing/str
  strKaseya/str
/arr
arr name=positionName
  strExecutive/str
  strTechnician/str
/arr
arr name=experience
  strBoeing, Executive/str
  strKaseya, Technician/str
/arr
doc

doc
str name=id2/str
arr name=companyName
  strBoeing/str
  strKodak/str
/arr
arr name=positionName
  strTechnician/str
  strExecutive/str
/arr
arr name=experience
  strBoeing, Technician/str
  strKodak, Executive/str
/arr
doc


*Option 2:

*
Storing separate experience in separate fields and generate query
q=(exp1:(Boeing AND Executive) OR exp2:(Boeing AND Executive)) and this can
be queried to return the docs with the expected match.

doc
str name=id2/str
   ...
str name=exp1Boeing, Executive/str
str name=exp2Kodak, Executive/str
doc
*

*
Please suggest.
*
*
I would just love to know how linkedin does it to show facets for people
working in company with titles.

Thanks




On Sat, Sep 28, 2013 at 9:58 PM, Jack Krupansky j...@basetechnology.comwrote:

 multivalued fields maintained in order

 That is not a feature supported by Solr.

 Solr will maintain the order of an individual multivalued field and will
 return the values of that field in order, but makes no other use of the
 order.

 Ditto for corresponding multivalued fields. Solr does not support any
 correspondence between multivalued fields.

 You must flatten your data your data to achieve any correspondence.

 Multivalued field are a powerful feature of Solr, but you must be
 extremely careful to use them only in moderation.

 -- Jack Krupansky

 -Original Message- From: Rohit Kumar
 Sent: Saturday, September 28, 2013 12:11 PM
 To: solr-user@lucene.apache.org
 Subject: Section Search in SOLR


 Hi,

 I have following SOLR documents indexed.

 doc
str name=id1/str
arr name=companyName
  strBoeing/str
  strKaseya/str
/arr
arr name=positionName
  strExecutive/str
  strTechnician/str
/arr
 doc

 doc
str name=id2/str
arr name=companyName
  strBoeing/str
  strKodak/str
/arr
arr name=positionName
  strTechnician/str
  strExecutive/str
/arr
 doc


 Company name and Position name are multivalued fields maintained in order.

 The following is the solr query.
 *fq=companyName:Boeingfq=**positionName:Executive* which returns both the

 documents as expected.
 What changes will i have to make to be able to search for
 companyName:Boeing and positionName:Executive both at same indexes in the
 corresponding multivalued fields i.e. should return me only doc id 1.


 Thanks,
 Rohit Kumar



Frequent softCommits leading to high faceting times?

2013-09-15 Thread Rohit Kumar
Hi,

We are running *SOLR 4.3* with 8 Gb of index on

Ubuntu 12.04 64 bits
Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Single core.
16GB RAM


We just started using the autoSoftCommit feature and noticed the facet
queries slowed down from milliseconds taking earlier to a minute. We have *8
facet fields*.

We add close to 300 documents per second during peak interval.

autoCommit
maxTime60/maxTime
openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit


Here is some information i got with debugQuery. Please note that *facet
time is more than 50 seconds.*

lst name=process
double name=time50779.0/double
lst name=network
double name=time0.0/double
/lst
lst name=query
double name=time41.0/double
/lst
*lst name=facet
double name=time50590.0/double
/lst*
lst name=mlt
double name=time0.0/double
/lst
lst name=highlight
double name=time0.0/double
/lst
lst name=stats
double name=time0.0/double
/lst
lst name=connection
double name=time5.0/double
/lst
lst name=debug
double name=time143.0/double
/lst
/lst

Please help.

Thanks,
Rohit Kumar


Searching solr on school name during year

2013-09-08 Thread Rohit Kumar
Hi,

Currently I have a student search which allows me to search for documents
in a school. I am looking at including year search into the existing schema
which would enable users to search for students in a school during an year.
I have a proposed change in the schema to add the year component to
facilitate this search.


Existing schema: (No year information currently)

field name=id type=string indexed=true stored=true required=true
multiValued=false /
field name=name type=text_general indexed=true stored=true /
field name=schoolName type=text_general indexed=true stored=true
multiValued=true/

Current sample data:
name:Borris Mayers
schoolName:Canterbury University




New schema:

field name=id type=string indexed=true stored=true required=true
multiValued=false /
field name=name type=text_general indexed=true stored=true /
field name=schoolName type=text_general indexed=true stored=true
multiValued=true/
field name=schoolNameWithTermOriginal type=string indexed=false
stored=true multiValued=true/


Sample data:

name:Borris Mayers
schoolName:Canterbury University, start_2001, year_2001, year_2002,
year_2003, year_2004, year_2005, end_2005
schoolNameWithTermOriginal:Canterbury University||2001-2005


Please suggest if its a correct approach or there is a better way to do the
same.
I am using Solr 4.3.


Thanks,
Rohit Kumar


Searching in stopwords

2013-07-27 Thread Rohit Kumar
I have a company search which uses stopwords during quezary time. In my
stopwords list i have entries like :

HR
Club
India
Pvt.
Ltd.



So if i search for companies like HR Club i get no results. Similarly
search for India HR giving no results. How can i get results in query for
following companies :

1. HR India
2. HR Club
3. HR India Pvt Ltd


I would still want to maintain the above list of stopwords since these
letters occur heavily in company text.

Please guide if i need to change my strategy itself.

field name=company type=text_lowercase_whitespace indexed=true
stored=true /



fieldType name=text_lowercase_whitespace class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.PorterStemFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
/fieldType



Thanks
Rohit Kumar


Using Solr to search between two Strings without using index

2013-07-25 Thread Rohit Kumar
Hi,

I have a scenario.

String array = [Input1 is good, Input2 is better, Input2 is sweet,
Input3 is bad]

I want to compare the string array against the given input :
String inputarray= [Input1, Input2]


It involves no indexes. I just want to use the power of string search to do
a runtime search on the array and should return

[Input1 is good, Input2 is better, Input2 is sweet]



Thanks


Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
My solr config has :

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

!-- softAutoCommit is like autoCommit except it causes a
 'soft' commit which only ensures that changes are visible
 but does not ensure that data is synced to disk.  This is
 faster and more near-realtime friendly than a hard commit.
  --
   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit


Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
over tomcat.


Still when i am adding documents to solr and searching its returning 0
hits. Its taking long before the document actually starts showing up.

Can somebody help.

Thanks


Re: Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
I checked with the tomcat logs. Although the config says it to commit every
15000ms

autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit


Strangely there are no commit logs. Did i miss anything?


-

Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
tomcat . The index size is 10.95 GB. With this configuration it takes more
than 60 seconds to return the indexed document. When adding documents to
solr and searching after soft commit time, its returning 0 hits. Its taking
long before the document actually starts showing up, even more than the
autoCommit interval.

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
tomcat.








On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.comwrote:

 You should see the commit messages in the solr logs, do they come up at the
 expected frequency?


 On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote:

  My solr config has :
 
   autoCommit
 maxTime15000/maxTime
 openSearcherfalse/openSearcher
   /autoCommit
 
  !-- softAutoCommit is like autoCommit except it causes a
   'soft' commit which only ensures that changes are visible
   but does not ensure that data is synced to disk.  This is
   faster and more near-realtime friendly than a hard commit.
--
 autoSoftCommit
   maxTime1000/maxTime
 /autoSoftCommit
 
 
  Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
  over tomcat.
 
 
  Still when i am adding documents to solr and searching its returning 0
  hits. Its taking long before the document actually starts showing up.
 
  Can somebody help.
 
  Thanks
 



Re: Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
1. Do you have an update processor chain that doesn't have RunUpdate in it?*- No
*

2. Is the updateLog solrconfig directive missing? - *Bang On. It was
still commented !!!*

3. Is _version_ missing from your schema?  *Checked it. and its present


*
*I will test again and update soon .


*
*Thanks

*



On Fri, Jul 5, 2013 at 8:30 AM, Jack Krupansky j...@basetechnology.comwrote:

 1. Do you have an update processor chain that doesn't have RunUpdate in it?

 2. Is the updateLog solrconfig directive missing?

 3. Is _version_ missing from your schema?

 -- Jack Krupansky

 -Original Message- From: Rohit Kumar
 Sent: Thursday, July 04, 2013 9:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Auto Soft commit not working !!!


 I checked with the tomcat logs. Although the config says it to commit every
 15000ms

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/**openSearcher
 /autoCommit


 Strangely there are no commit logs. Did i miss anything?


 --**--**
 -

 Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
 tomcat . The index size is 10.95 GB. With this configuration it takes more
 than 60 seconds to return the indexed document. When adding documents to
 solr and searching after soft commit time, its returning 0 hits. Its taking
 long before the document actually starts showing up, even more than the
 autoCommit interval.

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/**openSearcher
 /autoCommit

   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

 Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
 tomcat.








 On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.com
 wrote:

  You should see the commit messages in the solr logs, do they come up at
 the
 expected frequency?


 On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote:

  My solr config has :
 
   autoCommit
 maxTime15000/maxTime
 openSearcherfalse/**openSearcher
   /autoCommit
 
  !-- softAutoCommit is like autoCommit except it causes a
   'soft' commit which only ensures that changes are visible
   but does not ensure that data is synced to disk.  This is
   faster and more near-realtime friendly than a hard commit.
--
 autoSoftCommit
   maxTime1000/maxTime
 /autoSoftCommit
 
 
  Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
  over tomcat.
 
 
  Still when i am adding documents to solr and searching its returning 0
  hits. Its taking long before the document actually starts showing up.
 
  Can somebody help.
 
  Thanks
 





SOLR : ArrayIndexOutOfBoundsException from SolrDispatchFilter

2013-06-19 Thread Rohit Kumar
Need help to figure out the error below.


*Code Snippet*:

public class ConnectionComponent extends SearchComponent {

@Override

public void process(ResponseBuilder rb) throws IOException {

NamedList nList = new SimpleOrderedMap();



NamedList nl= new SimpleOrderedMap();


ListDocument ld = new ArrayListDocument();
Document mydoc = new Document();
mydoc.add(f); //IndexableField f not null
ld.add(mydoc);

nl.add(someKey, ld);

nList.add(otherKey, nl);



// rb instance of ResponseBuilder

rb.rsp.add(returnKey, nList);

  }

}


RROR org.apache.solr.servlet.SolrDispatchFilter  ?
null:java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.get(ArrayList.java:324)
at java.util.Collections$UnmodifiableList.get(Collections.java:1152)
at 
org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:92)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:165)
at org.apache.solr.response.JSONWriter.writeArray(JSONResponseWriter.java:526)
at 
org.apache.solr.response.TextResponseWriter.writeArray(TextResponseWriter.java:289)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:192)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188)