I run a small solr cloud cluster (4.5) of 3 nodes, 3 collections with 3
shards each. Total index size per node is about 20GB with about 70M
documents.
In regular traffic (27-50 rpm) the performance is ok and response time
ranges from 100 to 500ms.
But when I start loading (overwriting) 70M
I have to modify a schema where I can attach nested pricing per store
information for a product. For example:
10010137332:{
title:iPad 64gb
description: iPad 64gb with retina
pricing:{
merchantid64354:{
locationid643:{
USD|600
}
I see sudden drop in throughput once every 3-4 days. The downtime is for
about 2-6minutes and things stabilize after that.
But I am not sure what is causing it the problem.
I have 3 shards with 20GB of data on each shard.
Solr dashboard: http://i.imgur.com/6RWT2Dj.png
Newrelic graphs when during
on these machine.
6. top screenshot: http://i.imgur.com/g6w9Bim.png
Thanks!
On Tue, Apr 8, 2014 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:
On 4/8/2014 5:30 PM, Utkarsh Sengar wrote:
I see sudden drop in throughput once every 3-4 days. The downtime is
for
about 2-6minutes and things
,
-Utkarsh
On Tue, Apr 8, 2014 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote:
On 4/8/2014 6:00 PM, Utkarsh Sengar wrote:
Lots of questions indeed :)
1. Total virtual machines: 3
2. Replication factor: 0 (don't have any replicas yet)
3. Each machine has 1 shard which has 20GB of data
Hi Rashmi,
Relevancy needs some kind of training data which can lead to a chicken and
egg problem. If you dont have that training set, then you need to come up
with it or train manually (provide some seed).
Our existing search had 2 years worth clickstream data, i.e. we know if
someone searches
,merchantid_800
892828282,[82,82],
922932932,,[22,23]
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201208.mbox/%3CCAEFAe-Hew1CKk=EyqACFUTKqGHExXZLSHtyrgym09aYQVJf=t...@mail.gmail.com%3E
Thanks,
-Utkarsh
On Fri, Jan 24, 2014 at 12:05 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Hi
Hi guys,
I have to load extra meta data to an existing collection.
This is what I am looking for:
For a UPC: Store availability by merchantId per location (which has lat/lon)
My query pattern will be: Given a keyword, find all available products for
a merchantId around the given lat/lon.
I am not sure what happened, I updated merchant collection and then
restarted all the solr machines.
This is what I see right now: http://i.imgur.com/4bYuhaq.png
merchant collection looks fine. But deals and prodinfo collections should
have a total of 3 shards. But someone shard1 has converted
solr 4.4.0
On Wed, Jan 22, 2014 at 3:12 PM, Mark Miller markrmil...@gmail.com wrote:
What version of Solr are you running?
- Mark
On Jan 22, 2014, 5:42:30 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote: I am not sure what happened, I updated merchant collection and then
restarted all
, 2014, 6:14:10 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote: solr 4.4.0
On Wed, Jan 22, 2014 at 3:12 PM, Mark Miller markrmil...@gmail.com
wrote:
What version of Solr are you running?
- Mark
On Jan 22, 2014, 5:42:30 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote: I am not sure
I am experimenting with implementing a price drop feature.
Can I register some document's fields and trigger some sort of events if
the values change in those fields?
For example:
1. Price of itemX is $10
2. Say the price changes to $17 or $5 (increases or decreases) when the new
data loads.
3.
http://sematext.com/
On Dec 27, 2013 6:19 PM, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:
Hi,
This sounds like it would be best implemented outside the search engine.
Otis
Solr ElasticSearch Support
http://sematext.com/
On Dec 27, 2013 4:29 PM, Utkarsh Sengar utkarsh2
Also, attorney:(Roger Miller) is same as attorney:Roger Miller right? Or
the term Roger Miller is run against attorney?
Thanks,
-Utkarsh
On Tue, Nov 19, 2013 at 12:42 PM, Rafał Kuć r@solr.pl wrote:
Hello!
In the first one, the two terms 'Roger' and 'Miller' are run against
the attorney
Bumping this one again, any suggestions?
On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Hello,
I load data from csv to solr via UpdateCSV. There are about 50M documents
with 10 columns in each document. The index size is about 15GB and I am
using a 3 node
at 11:22 AM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Bumping this one again, any suggestions?
On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Hello,
I load data from csv to solr via UpdateCSV. There are about 50M
documents
with 10 columns
?) with commits at each point.
You will have a bottleneck somewhere, usually disk or CPU. Yours appears
to be disk. If you get faster disks, it might become the CPU.
wunder
On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
Bumping this one again, any suggestions
Hello,
I load data from csv to solr via UpdateCSV. There are about 50M documents
with 10 columns in each document. The index size is about 15GB and I am
using a 3 node distributed solr cluster.
While loading the data the disk IO goes to 100%. if the load balancer in
front of solr hits the
We use this to start/stop solr:
Start:
java -Dsolr.clustering.enabled=true -Dsolr.solr.home=multicore
-Djetty.class.path=lib/ext/* -Dbootstrap_conf=true -DnumShards=3
-DSTOP.PORT=8079 -DSTOP.KEY=some_value -jar start.jar
Stop:
java -Dsolr.solr.home=multicore -Dbootstrap_conf=true -DnumShards=3
Bumping this one, any suggestions?
Looks like if() and exists() are meant to solve this problem, but I am
using it in a wrong way.
-Utkarsh
On Thu, Oct 17, 2013 at 1:16 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
I trying to do this:
if (US_offers_i exists):
fq=US_offers_i:[1
Thanks Chris! That worked!
I overengineered my query!
Thanks,
-Utkarsh
On Fri, Oct 18, 2013 at 12:02 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: I trying to do this:
:
: if (US_offers_i exists):
:fq=US_offers_i:[1 TO *]
: else:
:fq=offers_count:[1 TO *]
if() and
I trying to do this:
if (US_offers_i exists):
fq=US_offers_i:[1 TO *]
else:
fq=offers_count:[1 TO *]
Where:
US_offers_i is a dynamic field containing an int
offers_count is a status field containing an int.
I have tried this so far but it doesn't work:
, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Didn't help.
This is the complete data: https://gist.github.com/utkarsh2012/6927649(see
merchantList column).
I tried this URL:
curl '
http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator
Hello,
I am trying to use split: http://wiki.apache.org/solr/UpdateCSV#split while
loading some csv data via updateCSV.
This is the field:
field name=merchantList type=string indexed=true stored=true
multiValued=true omitNorms=true termVectors=false
termPositions=false termOffsets=false/
This
: text/csv -d '
id,name,features
doc-1,doc1,feat1:feat2'
You may need to add stream.contentType=text/csv to you command.
-- Jack Krupansky
-Original Message- From: Utkarsh Sengar
Sent: Thursday, October 10, 2013 4:51 PM
To: solr-user@lucene.apache.org
Subject: Using split
=truef.features.separator=%3Af.features.encapsulator=%22;
-H Content-Type: text/csv -d '
id,name,features
doc-1,doc1,feat1:feat2'
Thanks,
-Utkarsh
On Thu, Oct 10, 2013 at 5:10 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Didn't help.
This is the complete data: https://gist.github.com
On Sep 17, 2013, at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
I have a copyField called allText with type text_general:
https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68
I have ~100 documents which have the text: dyson and dc44 or dc41 etc.
For example:
title
WordDelimiterFilterFactory was the culprit. Removing that fixed the problem.
Thanks,
-Utkarsh
On Tue, Sep 24, 2013 at 12:17 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
@Furkan Yes, I have run a commit, other text is searchable.
Not sure what you mean there for MultiPhraseQuery
I'd think about parsing them externally and using, say, SolrJ
to transmit the individual records to Solr.
Best,
Erick
On Mon, Sep 16, 2013 at 2:47 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Hello,
I am using UpdateCSV to load data in solr.
Currently I load this schema
I have a copyField called allText with type text_general:
https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68
I have ~100 documents which have the text: dyson and dc44 or dc41 etc.
For example:
title: Dyson DC44 Animal Digital Slim Cordless Vacuum
description: The DC44 Animal is the
To add to it, I see the exact problem with the queries: nikon d7100,
nikon d5100, samsung ps-we450 etc.
Thanks,
-Utkarsh
On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
I have a copyField called allText with type text_general:
https://gist.github.com/utkarsh2012
Hello,
I am using UpdateCSV to load data in solr.
Currently I load this schema with a static set of values:
userid,name,age,location
john8322,John,32,CA
tom22,Tom,30,NY
But now I have this usecase where john8322 might have a state specific
dynamic field for example:
userid,name,age,location,
bumping this one, any suggestions?
I am sure this is solrcloud 101 but I couldn't find documentation anywhere.
Thanks,
-Utkarsh
On Wed, Aug 28, 2013 at 2:37 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
I have a 3 node solrcloud cluster with 3 shards for each collection/core.
At times when
I have a 3 node solrcloud cluster with 3 shards for each collection/core.
At times when I rebuild the index say on collectionA on nodeA (shard1) via
UpdateCSV, the Cloud status page says that collectionA on nodeA (shard1)
is down.
Observations:
1. Other collections on nodeA work.
2. collectionA
understand.
Best
Erick
On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Some of the queries (not all) with special chars return no documents.
Example: queries returning no documents
q=mm (this can be explained, when I search for m m, no documents
StandardTokenizerFactory since I need it for other searches.
Thanks,
-Utkarsh
On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Thanks for the info.
1.
http://SERVER/solr/prodinfo/select?q=o%27reillywt=jsonindent=truedebugQuery=truereturn:
{
responseHeader
.
This last is my recurring plea to insure that the effort is of real benefit
to the user and not just something someone noticed that's actually
only useful 0.001% of the time.
Best
Erick
On Tue, Aug 27, 2013 at 5:00 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Yup, the query o'reilly worked
Some of the queries (not all) with special chars return no documents.
Example: queries returning no documents
q=mm (this can be explained, when I search for m m, no documents are
returned)
q=o'reilly (when I search for o reilly, I get documents back)
Queries returning documents:
q=helloworld
That's a good point, we load data from pig to solr everyday.
1. What we do:
Pig jobs creates a csv dump, scp it over to a solr node and UpdateCSV
request handler loads the data in solr. A complete rebuild of index for
about 50M documents (20GB) takes 20mins (pig job which pulls and processes
data
Thanks Tamanjit and Erick.
I tried out the filters, most of the usecases work except q=bestbuy. As
mentioned by Erick, that is a hard one to crack.
I am looking into DictionaryCompoundWordTokenFilterFactory but compound
words like these:
, 2013 at 4:48 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Thanks Tamanjit and Erick.
I tried out the filters, most of the usecases work except q=bestbuy. As
mentioned by Erick, that is a hard one to crack.
I am looking into DictionaryCompoundWordTokenFilterFactory but compound
words like
I have a field which consists of a store name.
How can I make sure that these queries return relevant results when
searched against this column:
*Example1: Best Buy*
q=best (tokenizer filter makes this work)
q=bestbuy
q=buy (tokenizer filter makes this work)
q=best buy (lower case filter makes
Hello,
Is it possible to load a list in a solr filed and query for items in that
list?
example_core1:
document1:
FieldName=user_ids
Value=8,6,1,9,3,5,7
FieldName=allText
Value=text to be searched over with title and description
document2:
FieldName=user_ids
Value=8738,624623,7272.82272,733
=tags:tag1
or use the tags to filter out results like
q=queryfq=tags:tag1
Thanks!
-Utkarsh
On Wed, Aug 14, 2013 at 11:57 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Thanks Aloke!
So a multivalued field assumes:
1. if data is inserted in this form: 8738,624623,7272,82272,733
the EnglishMinimalStemFilterFactory filter in text_general
fieldType is messing your suggestion.
On 6 August 2013 15:33, Utkarsh Sengar utkarsh2...@gmail.com wrote:
Jack/Chris,
1. This is my complete schema.xml:
https://gist.github.com/utkarsh2012/6167128/raw/1d5ac6520b666435cd040b5cc6dcb434cdfd7925
Jack/Chris,
1. This is my complete schema.xml:
https://gist.github.com/utkarsh2012/6167128/raw/1d5ac6520b666435cd040b5cc6dcb434cdfd7925/schema.xml
More specifically, allText is of type: text_general which has a
LowerCaseFatcory during index time.
2. allText has values:
Bumping this one, is this feature maintained anymore?
Thanks,
-Utkarsh
On Fri, Aug 2, 2013 at 2:27 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
I am trying to get autocorrect and suggest feature work on my solr 4.4
setup.
As recommended here: http://wiki.apache.org/solr/Suggester
I am trying to get autocorrect and suggest feature work on my solr 4.4
setup.
As recommended here: http://wiki.apache.org/solr/Suggester, this is my
solrconfig: http://apaste.info/eBPr
Where allText is a copy field which indexes all the content I have in
document title, description etc.
I am
Thanks guys! Will play around with it function query.
Thanks,
-Utkarsh
On Tue, Jul 30, 2013 at 10:50 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: bq: I am also trying to figure out if I can place
: extra dimensions to the solr score which takes other attributes into
: consideration
We have been using newrelic (they have a free plan too) and gives all
needed info like: jvm heap usage in eden space, survivor space and old gen.
Garbage collection info, detailed info about the solr requests and its
response times, error rates etc.
I highly recommend using newrelic to monitor
you'd manage to separate
the signal from the noise
Best
Erick
On Wed, Jul 24, 2013 at 4:37 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
I have a solr query which has a bunch of boost params for relevancy. This
search works fine and returns the most relevant documents as per the user
I have a solr query which has a bunch of boost params for relevancy. This
search works fine and returns the most relevant documents as per the user
query. For example, if user searches for: iphone 5, keywords like
apple, wifi etc are boosted. I get these keywords from external
training. The top
of the problem.
Best
Erick
On Mon, Jul 15, 2013 at 7:40 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
I have also tried these queries (as per this SO answer:
http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core
)
1. http://_server_.com:8983
stored=true multiValued=false /
Thanks,
-Utkarsh
On Tue, Jul 16, 2013 at 11:39 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Looks like the JoinQParserPlugin is throwing an NPE.
Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key
to=merchantId fromIndex=merchant}
84343345
Hello,
I am trying to join data between two cores: merchant and location
This is my query:
http://_server_.com:8983/solr/location/select?q={!join from=merchantId
to=merchantId fromIndex=merchant}walgreens
Ref: http://wiki.apache.org/solr/Join
Merchants core has documents for the query:
java.lang.Thread.run(Thread.java:662)\n,
code:500}}
Thanks,
-Utkarsh
On Mon, Jul 15, 2013 at 4:27 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Hello,
I am trying to join data between two cores: merchant and location
This is my query:
http://_server_.com:8983/solr/location/select?q={!join
that goes at the base Lucene index
and does the right thing. Or even re-indexing your
entire corpus periodically to add this kind of data.
FWIW,
Erick
On Sun, Jun 30, 2013 at 2:00 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Thanks Erick/Peter.
This is an offline process
this
into the full list to return to the user.
Solr really isn't built for this use-case, is it actually
a compelling situation?
And having your document cache set at 1M is kinda
high if you have very big documents.
FWIW,
Erick
On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar
Hello,
I have a usecase where I need to retrive top 2000 documents matching a
query.
What are the parameters (in query, solrconfig, schema) I shoud look at to
improve this?
I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM,
8vCPU and 7GB JVM heap size.
I have
%319
95%364
98%420
99%453
100%497 (longest request)
Sometimes it takes a lot of time, sometimes its pretty quick.
Thanks,
-Utkarsh
On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Hello,
I have a usecase where I need to retrive top 2000
Hello,
I am trying to update schema.xml for a core in a multicore setup and this
is what I do to update it:
I have 3 nodes in my solr cluster.
1. Pick node1 and manually update schema.xml
2. Restart node1 with -Dbootstrap_conf=true
java -Dsolr.solr.home=multicore -DnumShards=3
#Command_Line_Util
This means you will NOT need to start Solr with -Dboostrap_confdir at all.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
25. juni 2013 kl. 10:29 skrev Utkarsh Sengar utkarsh2...@gmail.com:
Hello,
I am trying to update schema.xml for a core
zkCli? Any
issues?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
25. juni 2013 kl. 11:24 skrev Utkarsh Sengar utkarsh2...@gmail.com:
But as when I launch a solr instance without -Dbootstrap_conf=true,
just
once core is launched and I cannot see the other core
I believe I am hitting this bug:
https://issues.apache.org/jira/browse/SOLR-4805
I am using solr 4.3.1
-Utkarsh
On Tue, Jun 25, 2013 at 2:56 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Yes, I have tried zkCli and it works.
But I also need to restart solr after the schema change right
use-case, 2-3 replicas might be okay. We don't have
enough information to answer that question.
On Sat, Jun 22, 2013 at 10:40 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
Thanks Anshum.
Sure, creating a replica will make it failure resistant, but death of
one shard should not make
Hello,
I am testing a 3 node solrcloud cluster with 3 shards. 3 zk nodes are
running in a different process in the same machines.
I wanted to know the recommended size of a solrcloud cluster (min zk nodes?)
This is the SolrCloud dump: https://gist.github.com/utkarsh2012/5840455
And, I am not
Just to be clear here, I when I say I killed a node. I just killed the
solr process on that node. zk on all the 3 nodes were still running.
Thanks,
-Utkarsh
On Sat, Jun 22, 2013 at 4:01 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Hello,
I am testing a 3 node solrcloud cluster with 3
from each shard for the SolrCloud setup
to work for you.
When you kill 1 shard, you essentially are taking away 1/3 of the range of
shard key.
On Sat, Jun 22, 2013 at 4:31 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:
Hello,
I am testing a 3 node solrcloud cluster with 3 shards. 3 zk
,
-Utkarsh
On Thu, Jun 13, 2013 at 7:28 PM, Shawn Heisey s...@elyograg.org wrote:
On 6/13/2013 7:51 PM, Utkarsh Sengar wrote:
Sure, I will reduce the count and see how it goes. The problem I have is,
after such a change, I need to reindex everything again, which again is
slow and takes time (40
Looks like zk does not contain the configuration called: collection1.
You can use zkCli.sh to see what's inside configs zk node. You can
manually push config via zkCli's upconfig (not very sure how it works).
Try adding this arg: -Dbootstrap_conf=true in place of
Hello,
I am evaluating solr for indexing about 45M product catalog info. Catalog
mainly contains title and description which takes most of the space (other
attributes are brand, category, price, etc)
The data is stored in cassandra and I am using datastax's solr (DSE 3.0.2)
which handles
and request 2000
documents in each query. By the current speed (just one machine), it will
take me ~20 days to do the initial training.
Thanks,
-Utkarsh
On Thu, Jun 13, 2013 at 6:25 PM, Shawn Heisey s...@elyograg.org wrote:
On 6/13/2013 5:53 PM, Utkarsh Sengar wrote:
*Problems
Hello,
I updated my schema to use a copyField and have triggered a reindex, 80% of
the reindexing is complete. Although when I query the data, I don't see
myNewCopyFieldName being returned with the documents.
Is there something wrong with my schema or I need to wait for the indexing
to complete
Thanks Shawn. Find my answers below.
On Thu, May 2, 2013 at 2:34 PM, Shawn Heisey s...@elyograg.org wrote:
On 5/2/2013 3:13 PM, Utkarsh Sengar wrote:
Hello,
I updated my schema to use a copyField and have triggered a reindex, 80%
of
the reindexing is complete. Although when I query
Solr 4.0 was indexing data and the machine crashed.
Any suggestions on how to recover my index since I don't want to delete my
data directory?
When I try to start it again, I get this error:
ERROR 12:01:46,493 Failed to load Solr core: xyz.index1
ERROR 12:01:46,493 Cause:
ERROR 12:01:46,494
Hello,
I have setup a solr4 instance (just one node) and I see this memory pattern:
[image: Inline image 1]
Physical memory is nearly full and JVM memory is ok. I have ~40M documents
(where 1 document=1KB) indexed and in production env I am planning to setup
2 solr cloud nodes.
So I have 2
, but I'm not seeing your inlined image.
It's not just you.
On Tue, Apr 16, 2013 at 7:52 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:
So I have 2 questions:
1. What is the recommended memory for those 2 nodes?
2. I am not sure what does Physical memory mean in context to solr. My
Hello,
I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.
Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.elasticsearch.org/guide/reference/river/
which pulls data from
-enterprise
-- Jack Krupansky
-Original Message- From: Utkarsh Sengar
Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra
Hello,
I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API
79 matches
Mail list logo