Hi,
I'm facing a dilemma of choosing the indexing strategies.
My application architecture is
- I have a listing table in my DB
- For each listing, I have 3 calls to a URL Datasource of different system
I have 200k records
Time taken to index 25 docs is 1Minute, so for 200k it might take
Hi,
I am beginner of solr,
I am trying to implement phonetic search in my application
my code in schema.xml for fieldType
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter
We have a system which consists of 2 shards while every shard has a leader
and one replica.
During indexing one of the shards (both leader and replica) was shut down.
We got two types of HTTP requests: rm= Service Unavailable and rm=OK.
From this we’ve got to the conclusion that the shard which
Note: In SolrCloud terminology, a leader is also a replica. IOW, you have
two replicas, one of which (and it can vary over time) is elected as leader
for that shard.
The other shards remain capable of indexing even if one shard becomes
unavailable. That is expected - and desired - behavior in
Hi,
I am using solr for searching phoneticly equivalent string
my schema contains...
fieldType name=text_general_doubleMetaphone class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
I am running a very simple performance experiment where I post 2000 documents
to my application. Who in turn persists them to a relational DB and sends them
to Solr for indexing (Synchronously, in the same request).
I am testing 3 use cases:
1. No indexing at all - ~45 sec to post 2000
Doing a standard commit after every document is a Solr anti-pattern.
commitWithin is a “near-realtime” commit in recent versions of Solr and not a
standard commit.
https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
- Mark
http://about.me/markrmiller
On Feb 12, 2014, at
Yes, committing after each document will greatly degrade performance. I
typically use autoCommit and autoSoftCommit to set the time interval
between commits, but commitWithin should have a similar effect.. I often
see performance of 2000+ docs per second on the load using auto commits.
When
I setup a Solr Core and populated it with documents but I am not able to get
any results when attempting to search the documents.
A generic search (q=*.*) returns all documents (and fields/values within
those documents), however when I try to search using specific criteria I get
no results back.
I absolutely agree and I even read the NRT page before posting this question.
The thing that baffles me is this:
Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms delay,-
I expect that this behavior will be equivalent to
On 12 February 2014 20:53, Maheedhar Kolla maheedhar.ko...@gmail.com wrote:
Hi ,
I need help with importing data, through DIH. ( using solr-3.6.1, tomcat6 )
I see the following error when I try to do a full-import from my
local MySQL table (
I was just trying to use SolrJ Client to import XML data to Solr server. And
I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary
format
I realized there is a XML parser in Solr (We can use a dataUpadateHandler in
Solr default UI Solr Core Dataimport)
So I was wondering
Hi ,
I need help with importing data, through DIH. ( using solr-3.6.1, tomcat6 )
I see the following error when I try to do a full-import from my
local MySQL table ( http:/s/solr//dataimport?command=full-import
).
snip
..
str name=Total Requests made to DataSource0/str
str name=Total
It can be anything from wrong credentials, to missing driver in the class path,
to malformed connection string, etc..
What does the Solr log say?
-Original Message-
From: Maheedhar Kolla [mailto:maheedhar.ko...@gmail.com]
Sent: יום ד 12 פברואר 2014 17:23
To:
On 12 February 2014 20:57, leevduhl ld...@corp.realcomp.com wrote:
[...]
However, when I try to search specifically where mailingcity=redford I
don't get any results back. See the following query/results.
Query:
Cross-posting my answer from SO:
According to this wiki:
https://wiki.apache.org/solr/NearRealtimeSearch
the commitWithin is a soft-commit by default. Soft-commits are very
efficient in terms of making the added documents immediately searchable.
But! They are not on the disk yet. That means the
I'd seriously consider a SolrJ program that pulled the necessary data from
two of your systems, held it in cache and then pulled the data from your
main system and enriched it with the cached data.
Or export your information from your remote systems and import them into
a single system where you
First, why are you talking about DoubleMetaphone when
your fieldType uses BeiderMorseFilterFactory? Which points
up a basic issue you need to wrap your head around or you'll
be endlessly confused. At least I was...
Your analysis chains _must_ do compatible things at index and
query time. The
Thanks for the comments/advice. I did mess with the drivers ( by
deliberately moving the libs) and it did fail as it is supposed to.
When I looked into catalina.out, I realized that the problem lies with
data directory being owned by root instead of tomcat6. I changed it
so that tomcat6 can
Hmmm, before going there let's be sure you're trying to do
what you think you are.
Solr does _not_ index arbitrary XML. There is a very
specific format of XML that describes solr documents
that _can_ be indexed. But random XML is not
supported. See the documents in example/exampledocs
for the XML
On 2/12/2014 8:21 AM, Eric_Peng wrote:
I was just trying to use SolrJ Client to import XML data to Solr server. And
I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary
format
I realized there is a XML parser in Solr (We can use a dataUpadateHandler in
Solr default
Thanks the syntax correction solved the problem. I actually thought I tried
that before I posted.
Thanks
Lee
--
View this message in context:
http://lucene.472066.n3.nabble.com/Newb-Search-not-returning-any-results-tp4116905p4116930.html
Sent from the Solr - User mailing list archive at
Here's some additional background that may shed light on the
performance..
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
Best,
Erick
On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan solrexp...@gmail.com wrote:
Cross-posting my answer from SO:
The explicit commit will cause your app to be delayed until that commit
completes, and then Solr would be idle until that request completion makes
its way back to your app and you submit another request which finds its way
to Solr, maybe a few ms. That includes network latency. That interval of
Thanks a lot, learnt a lot from it
--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116937.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks you so much Erick, I will try to write my owe XML parser
--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116936.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hello!
Just specify the left boundary, like: price:[900 TO 1000]
--
Regards,
Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/
When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
items with price
There is also an XSLT update handler option to transform raw XML to Solr
XML on the fly. If anybody here has used it, feel free to chime in.
See:
http://wiki.apache.org/solr/XsltUpdateRequestHandler
and
When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
items with price around 1000 USD. I found in documentation that i can use
price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
But i want to get results where price is between 900 to 1000 USD.
Is price a float/double field?
price:[99.5 TO 100.5] -- price near 100
price:[900 TO 1000]
or
price:[899.5 TO 1000.5]
-- Jack Krupansky
-Original Message-
From: jay67
Sent: Wednesday, February 12, 2014 12:03 PM
To: solr-user@lucene.apache.org
Subject: Using numeric ranges in Solr
Hi Robert,
I don't think this is possible at the moment, but I hope to get
https://issues.apache.org/jira/browse/SOLR-4478 in for Lucene/Solr 4.7, which
should allow you to inject your own SolrResourceLoader implementation for core
creation (it sounds as though you want to wrap the core's
Does Solr4 load entire index in Memory mapped file? What is the eviction policy
of this memory mapped file? Can we control it?
_
From: Joshi, Shital [Tech]
Sent: Wednesday, February 05, 2014 12:00 PM
To: 'solr-user@lucene.apache.org'
Subject: Solr4
No, Solr doesn't load the entire index in memory. I think you'll find
Uwe's blog most helpful on this matter:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
On Thu, Feb 13, 2014 at 12:27 AM, Joshi, Shital shital.jo...@gs.com wrote:
Does Solr4 load entire index in Memory
Shital,
Take a look at
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's
a pretty decent explanation of memory mapped files. I don't believe that the
default configuration for solr is to use MMapDirectory but even if it does my
understanding is that the entire
Hi David,
I finally got back to this again, after getting sidetracked for a couple of
weeks.
I implemented things in accordance with my understanding of what you wrote
below. Using SolrJ, the code to index the spatial field is as follows,
private void addSpatialField(double lat, double lon,
Hi all,I am running a Solr application and I would need to implement a feature that requires faceting and filtering on a large list of IDs. The IDs are stored outside of Solr and is specific to the current logged on user. An example of this is the articles/tweets the user has read in the last few
On 2/12/2014 12:07 PM, Greg Walters wrote:
Take a look at
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's
a pretty decent explanation of memory mapped files. I don't believe that the
default configuration for solr is to use MMapDirectory but even if it does my
Tri,
You will most likely need to implement a custom QParserPlugin to
efficiently handle what you described. Inside of this QParserPlugin you
could create the logic that would bring in your outside list of ID's and
build a DocSet that could be applied to the fq and the facet.query. I
haven't
That’s pretty weird. It appears that somehow a Spatial4j Point class is having
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) )
and then Spatial4j is trying to parse this which isn’t in a valid format — the
toString is more debug-ability. Your SolrJ code looks
And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.
Best,
roman
On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey
Navaa,
you need query expansion for that.
E.g. if your query goes through dismax, you need to add the two field names to
the qf parameter.
The nice thing is that qf can be:
text^3.0 test.stemmed^2 text.phonetic^1
And thus exact matches are preferred to stemmed or phonetic matches.
This is
On 2/6/2014 4:00 AM, Shawn Heisey wrote:
I would not recommend it, but if you know for sure that your
infrastructure can handle it, then you should be able to optimize them
all at once by sending parallel optimize requests with distrib=false
directly to the Solr cores that hold the shard
Hi David,
You wrote:
Perhaps you’ve got some funky UpdateRequestProcessor from experimentation
you’ve done that’s parsing then toString’ing it?
No, nothing at all. The update processing is straight out-of-the-box Solr.
And also, your stack trace should have more to it than what you
Your new code should also work, and should be equivalent.
The longer stack trace you have is of the wrapping SolrException which wraps
another exception — InvalidShapeException. You should also see the stack trace
of InvalidShapeException which should originate out of Spatial4j.
~ David
Hi,
I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.
lst name=appends
str name=fqSource:TestHelp/str
/lst
Now, the moment I
On 2/12/2014 3:32 PM, Shamik Bandopadhyay wrote:
Hi,
I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.
lst name=appends
str
Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).
lst name=appends
str name=fqSource:TestHelp | Source:downloads |
-AccessMode:internal |
On 2/12/2014 4:58 PM, shamik wrote:
Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).
lst name=appends
str name=fqSource:TestHelp |
Hello,
I use
icu4j-49.1.jar,
lucene-analyzers-icu-4.6-SNAPSHOT.jar
for one of the fields in the form
filter class=solr.ICUFoldingFilterFactory /
I need to change one of the accent char's corresponding letter. I made changes
to this file
Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.
Thanks, I'll take a look at the debug data.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117047.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Erick,
Thank you very much, those are valuable suggestions :-).
I would give a try.
Appreciate your time.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852p4117050.html
Sent from the Solr - User mailing list archive at Nabble.com.
Did you mean to use || for the OR operator? A single | is not treated as
an operator - it will be treated as a term and sent through normal term
analysis.
-- Jack Krupansky
-Original Message-
From: Shamik Bandopadhyay
Sent: Wednesday, February 12, 2014 5:32 PM
To:
Dear all gurus,
I would like to limit amount of search result, let's say I have many shop
which is selling shirt. So when I search white shirt I want to give a
maximum number per shop (ex. 5).
The result should be like this...
- Shop A
- Shop A
- Shop B
- Shop B
- Shop B
- Shop B
- Shop B
-
Chun,
Have you looked at Grouping / Field Collapsing feature in solr?
https://wiki.apache.org/solr/FieldCollapsing
If shop is one of your field, you can use field collapsing on that field
with a maximum of 'n' to return per field value (or group).
Sameer.
--
www.measuredsearch.com
tw:
Re-posting...
Thanks,
Anand
On 2/12/2014 10:55 AM, anand chandak wrote:
Thanks David, really helpful response.
You mentioned that if we have to add scoring support in solr then a
possible approach would be to add a custom QueryParser, which might be
taking Lucene's JOIN module. I have
Navaa,
You need the query to be sent to the two fields. In dismax, this is easy.
Paul
On 12 février 2014 14:22:33 HNEC, Navaa navnath.thomb...@xtremumsolutions.com
wrote:
Hi,
I am using solr for searching phoneticly equivalent string
my schema contains...
fieldType
Hi ,
I am new to solr , i need help with the following
PROBLEM: I have a huge file of 1 lines i want this to be an inclusion or
exclusion in the query . i.e each line like ( line1 or line2 or ..)
How can this be achieved in solr , is there a custom implementation that i
would need to
Hi,I am working on a prototyope where i have a content source i am indexing
all documents strore the index in solr.Now i have pre-condition that my
content source is ever changing means there is always new content added to
it. As i have read that solr use to do indexing on full source only
You have read that Solr needs to reindex a full source. That's correct
(unless you use atomic updates). But - the important point is - this
is per document. So, once you indexed your 1 documents, you don't
need to worry about them until they change.
Just go ahead and index your additional
hi,
Thanks for your reply..
I m beginner of solr kindly elaborate it mor details because in my
solrconfig.xml
requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
int name=rows5/int
str name=dfname/str
Thanks Alex,
Yes my source system maintains the crettion last modificaiton system of
each document.
As per your inputs, can i assume that next time when solr starts indexing,
it scans all the prsent in source but only picks those for indexing which
are either new or have been updated since
I had this problem when I started to look at Solr as an index for a file
server. What I ended up doing was writing a perl script that did this:
- Scan the whole filesystem and create an XML that is submitted into Solr for
indexing. As this might be some 600,000 files, I break it down into
I'd start from doing Solr tutorial. It will explain a lot of things.
But in summary, you can send data to Solr (best option) or you can
pull it using DataImportHandler. Take your pick, do the tutorial,
maybe read some books. Then come back with specific questions of where
you started.
Regards,
Why write a Perl script for that?
touch new_timestamp
find . -newer timestamp | script-to-submit mv new_timestamp timestamp
Neither approach deals with deleted files.
To do this correctly, you need lists of all the files in the index with their
timestamps, and of all the files in the
Thanks all.
I am following couple of articles for same.
I am sending data to solr instead of using DIH and able to successfully
index data in solr.
My concern here is to ensure how to minimize solr indexing so that only
updated data is indexed each time out of all data items.
Is this something
Hi again,
Anybody interested in this feature for Solr MailEntityProcessor?
WDYT?
Thanks,
Dileepa
On Thu, Jan 30, 2014 at 11:00 AM, Dileepa Jayakody
dileepajayak...@gmail.com wrote:
Hi All,
I think Oauth2 integration is a valid usecase for Solr when it comes to
importing data from
At the risk of derailing the thread:
We do a lot more in the script than is mentioned here: We pull out parts of the
path and mangle them (for example turn them into a UNC path for users to use,
or pull out a client name or job number using a known folder structure). As for
deleted files,
68 matches
Mail list logo