On Mon, Oct 7, 2013, at 11:09 PM, user 01 wrote:
Any way to store documents in a fixed sort order within the indexes of
certain fields(either the arrival order or sorted by int ids, that also
serve as my unique key), so that I could store them optimized for
browsing
lists of items ?
The
Thanks for your replies.
I am actually doing the frange approach for now. The only downside I see there
is it makes the function call twice, calling createWeight() twice. And so my
social connections are evaluated twice which is quite heavy operation. So I was
thinking if I could get away with
I have a test system where I have a index of 15M documents in one shard
that I would like to split in two. I've tried it four times now. I have a
stand-alone zookeeper running on the same machine.
The end result is that I have two new shards with state construction, and
each has one replica which
Hi ,
I have setup solrcloud with solr4.4. The cloud has 2 tomcat instances with
separate zookeeper.
i execute the below command in the url,
http://localhost:8180/solr/colindexer/dataimportmssql?command=full-importcommit=trueclean=false
response
lst name=responseHeader
int name=status0/int
int
Hello Kalle,
we noticed the same problem some weeks ago:
http://lucene.472066.n3.nabble.com/Share-splitting-at-23-million-documents-gt-OOM-td4085064.html
Would be interesting to hear if there is more positive feedback this time.
We finally concluded that it may be worth to start with many
I have an input that can have only 2 values Published or Deprecated. What
regular expression can I use to ensure that either of the two words was
submitted?
I tried with different regular expressions (as in the [1], [2]) that
contains most generic syntax.. But Solar throws parser exception when
Hi Kalle,
The problem here is that certain actions are taking too long causing the
split process to terminate in between. For example, a commit on the parent
shard leader took 83 seconds in your case but the read timeout value is set
to 60 seconds only. We actually do not need to open a searcher
It looks like your select statement does not return any rows... have you
verified it with some sort of SQL client?
On Tue, Oct 8, 2013 at 8:57 AM, Prasi S prasi1...@gmail.com wrote:
Hi ,
I have setup solrcloud with solr4.4. The cloud has 2 tomcat instances with
separate zookeeper.
i
Hello,
I'm trying to deploy, using SolRCloud, a cluster of 3 VMs with Windows, each
with an instance of SolR running on a Tomcat container AND with an external
ZooKeeper (3.4.5) (so 3 ZK + 3 SolR). I'm using SolR 4.2, the original conf
is multi-core (6 different cores)
I tried to set up a
My select statement retusn documents. i have checked the query in the sql
server.
The problem is the same configuration i have given with default handler
/dataimport. It was working. If i give it with /dataimportmssql handler , i
get this type of behaviour
On Tue, Oct 8, 2013 at 1:28 PM,
I found that:
+ - || ! ( ) { } [ ] ^ ~ * ? : \
at that URL:
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping+Special+Characters
I'm using Solr 4.5 Is there any full list of special characters to escape
inside my custom search API before making a request to SolrCloud?
Actually I want to remove special characters and wont send them into my
Solr indexes. I mean user can send a special query as like a SQL injection
and I want to prevent my system such kind of scenarios.
2013/10/8 Furkan KAMACI furkankam...@gmail.com
I found that:
+ - || ! ( ) { } [ ] ^ ~ *
I've solved this problem myself.
If you use core discovery, you must specify the numShards parameter in
core.properties.
or else solr won't be allocate range for each shards and then documents
won't be distributed properly.
Using core discovery to set up solr cloud in tomcat is much easier and
Thanks Erik,
I think I have been able to exhaust a resource
if I split the data in 2 and upload it with 2 clients like benchmark
1.1 it takes 120s here the bottleneck it my LAN,
if I use a setting like benchmark 1 probably the bottleneck is the
ramBuffer.
I'm
Why use regular expressions at all?
Try:
published OR deprecated
-- Jack Krupansky
-Original Message-
From: Dinusha Dilrukshi
Sent: Tuesday, October 08, 2013 3:32 AM
To: solr-user@lucene.apache.org
Subject: Regex to match one of two words
I have an input that can have only 2 values
Tim,
I suggest you open a new thread and not reply to this one to get noticed.
Dmitry
On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.comwrote:
Is there a way to make autoCommit only commit if there are pending changes,
ie: if there are 0 adds pending commit, don't
@Upayavira:
q=topic:x1rows=20sort=id desc
Or
q=topic:x1rows=20sort=timestamp desc
Will get you what you ask for.
yeah I know that I could use SORT that will work but I asked just for an
optimized way. Also that ticket has been fixed, so shouldn't be able to now
make use of the fixed sort
Hi there, this is my first message to this list :)
In our application we have a document split in several pages. When the
user searches for words in a document we want to bring all documents
containing all the words but we'd like to add a link to the specific
page for each highlighting.
We're in the process of moving onto SolrCloud, and have gotten to the point
where we are considering how to do our hardware setup.
We're limited to VMs running on our server cluster and storage system, so
buying new physical servers is out of the question - the question is how we
should
I think Mr. Erickson summarized the issue of hardware sizing quite well in
the following article:
http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
Best regards,
Primož
From: Henrik Ossipoff Hansen h...@entertainment-trading.com
To:
I was wrong in saying that we don't need to open a searcher, we do. I
committed a fix in SOLR-5314 to use soft commits instead of hard commits. I
also increased the read time out value. Both of these together will reduce
the likelyhood of such a thing happening.
I am using 4.3. It is not related to bugs related to last_index_time. The
problem is caused by the fact that the parent entity and child entity use
different data source (different databases on different hosts).
From the log output, I do see the the delta query of the child entity being
Bill,
I do not believe there is any way to tell it to use a different datasource for
the parent delta query.
If you used this approach, would it solve your problem:
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ?
James Dyer
Ingram Content Group
(615) 213-4311
I use Solr 4.5 and I have a WhiteSpaceTokenizer at my schema. What is the
difference (index size and performance) for that two sentences:
First one: This is a sentence.
Second one: This is a sentence.
Shamik,
Are you using a request handler other than /select, and if so, did you set
shards.qt in your request? It should be set to the name of the request
handler you are using.
See http://wiki.apache.org/solr/SpellCheckComponent?#Distributed_Search_Support
James Dyer
Ingram Content Group
Result is the same and performance difference should be negligible, unless
you're uploading megabytes of white space. Consecutive white space should be
collapsed outside of Solr/Lucene anyway because it'll end up in your stored
field. Index size will be slightly bigger but not much due to
*1. Span NOT Operator*
We have a business use case to use SPAN NOT queries in SOLR. Query
Parser of LUCENE currently doesn't support/parse SPAN NOT queries.
2.Adding Recursive and Range Proximity
*Recursive Proximity *is a proximity query within a proximity query
Ex: “ “income
Or a boolean field for published, with false meaning deprecated.
wunder
On Oct 8, 2013, at 3:42 AM, Jack Krupansky wrote:
Why use regular expressions at all?
Try:
published OR deprecated
-- Jack Krupansky
-Original Message- From: Dinusha Dilrukshi
Sent: Tuesday, October
CREATEALIAS is also used to move an alias.
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t: @appinions https://twitter.com/Appinions | g+:
You can index to an alias that points at only one collection. Works fine!
Michael Della Bitta
Applications Developer
o: +1 646 532 3062 | c: +1 917 477 7906
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t: @appinions
We are in the process of upgrading our Solr cluster to the latest and greatest
Solr Cloud. I have some questions regarding full indexing though. We're
currently running a long job (~30 hours) using DIH to do a full index on over
10M products. This process consumes a lot of memory and while
I am using suggester that uses external dictionary file for suggestions (as
below).
# This is a sample dictionary file.
iPhone3g
iPhone4 295
iPhone5c620
iPhone4g710
Everything works fine except for the fact that the suggester seems to be
case sensitive.
/suggest?q=ip
Hey Everyone,
When faceting on a field using the EdgeNGramFilterFactory the returned
facets values include all of the n-gram values. Is there a way to limit
this list to the stored values without creating a new field?
Thanks in advance!
Tyler
Facets do not return the stored constraints, it's usually bad idea to tokenize
or do some have analysis on facet fields. You need to copy your field instead.
-Original message-
From:Tyler Foster tfos...@cloudera.com
Sent: Tuesday 8th October 2013 19:28
To: solr-user@lucene.apache.org
Tyler, faceting works on indexed content and not stored content.
On Tue, Oct 8, 2013 at 10:45 PM, Tyler Foster tfos...@cloudera.com wrote:
Hey Everyone,
When faceting on a field using the EdgeNGramFilterFactory the returned
facets values include all of the n-gram values. Is there a way to
Thanks, that was the way it was looking. I just wanted to make sure I
wasn't missing something.
On Tue, Oct 8, 2013 at 10:32 AM, Markus Jelsma
markus.jel...@openindex.iowrote:
Facets do not return the stored constraints, it's usually bad idea to
tokenize or do some have analysis on facet
James,
Thanks for your reply. The shards.qt did the trick. I read the
documentation earlier but was not clear on the implementation, now it
totally makes sense.
Appreciate your help.
Regards,
Shamik
--
View this message in context:
Hi!We are running Solr 4.4.0 on a 3 node linux cluster and have about 2
collections storing product data with no problems. Yesterday, I attempted to
create another one of these collections using the Collections API, but I had
forgotten to upload the config to the zookeeper prior to making the call
The shards.qt parameter is the easiest one to forget, with the most dramatic of
consequences!
On Oct 8, 2013, at 11:10 AM, shamik sham...@gmail.com wrote:
James,
Thanks for your reply. The shards.qt did the trick. I read the
documentation earlier but was not clear on the implementation,
Hi,
We have recently migrated from Solr 3.6 to Solr 4.4. We are using the
Master/Slave configuration in Solr 4.4 (not Solr Cloud). We have noticed the
following behavior/defect.
Configuration:
===
1. The Hard Commit and Soft Commit are disabled in the configuration (we
control
Thanks for the suggestion but that won't work as I have last_modified field
in both the parent entity and child entity as I want delta import to kick
in when either change. That other approach has the same problem since the
parent and child entity uses different datasources.
Bill
On Tue, Oct
On 10/8/2013 3:01 AM, Furkan KAMACI wrote:
Actually I want to remove special characters and wont send them into my
Solr indexes. I mean user can send a special query as like a SQL
injection
and I want to prevent my system such kind of scenarios.
There is a newer javadoc than the *very* old
Hey,
I am using solr 4.0 with my own PostFilter implementation which is executed
after the normal solr query is done. This filter has a cost of 100. Is it
possible to run filter queries on the index after the execution of the post
filter?
I tried adding the below line to the url but it did not
This my new schema.xml:
schema name=documents
fields
field name=id type=string indexed=true stored=true required=true
multiValued=false/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field
Yes, you've saved us all lots of time with this article. I'm about to do
the same for the old Jetty or Tomcat? container question ;).
Tim
On 7 October 2013 18:55, Erick Erickson erickerick...@gmail.com wrote:
Tim:
Thanks! Mostly I wrote it to have something official looking to hide
behind
Hi,
We are using auto discovery and have a use case where we want to be
able to add cores dynamically, without restarting solr.
In 4.4 we were able to
- add a directory (e.g. core1) with an empty core.properties
- call
Thank you Erick.
I will try this.
Regards
Dominique
Le 06/10/13 03:03, Erick Erickson a écrit :
Consider implementing a special field that of the form
accentfolded|original
For instance, you'd index something like
ecole|école
ecole|école privée
as _terms_, not broken up at all.
Now, when you
I have a genuine question with substance here. If anything this
nonconstructive, rude response was to get noticed. Thanks for
contributing to the discussion.
Tim
On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote:
Tim,
I suggest you open a new thread and not reply to this one to
I'm curious what the later shard-local bits do, if anything?
I have a very large cluster (256 shards) and I'm sending most of my data
with a single composite, e.g. 1234!unique_id, but I'm noticing the data
is being split among many of the shards.
My guess right now is that since I'm only using
On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner br...@bretthoerner.com wrote:
I'm curious what the later shard-local bits do, if anything?
I have a very large cluster (256 shards) and I'm sending most of my data
with a single composite, e.g. 1234!unique_id, but I'm noticing the data
is being
Router is definitely compositeId.
To be clear, data isn't being spread evenly... it's like it's *almost*
working. It's just odd to me that I'm slamming in data that's 99% of one
_route_ key yet after a few minutes (from a fresh empty index) I have 2
shards with a sizeable amount of data (68M and
Is there a way to configure Solr 'defaults/appends/invariants' such that
the product of the 'start' and 'rows' parameters doesn't exceed a given
value? This would be to prevent deep pagination. Or would this require a
custom requestHandler?
Peter
I am having trouble trying to return a particular dynamic field only instead of
all dynamic fields.
Imagine I have a document with an unknown number of sections. Each section can
have a 'title' and a 'body'
I have each section title and body as dynamic fields such as section_title_*
and
This is my clusterstate.json:
https://gist.github.com/bretthoerner/0098f741f48f9bb51433
And these are my core sizes (note large ones are sorted to the end):
https://gist.github.com/bretthoerner/f5b5e099212194b5dff6
I've only heavily sent 2 shards by now (I'm sharding by hour and it's
been
I don't know of any OOTB way to do that, I'd write a custom request handler
as you suggested.
Tomás
On Tue, Oct 8, 2013 at 3:51 PM, Peter Keegan peterlkee...@gmail.com wrote:
Is there a way to configure Solr 'defaults/appends/invariants' such that
the product of the 'start' and 'rows'
I'd recommend a custom first-components SearchComponent. Then it could
simply validate (or adjust) the parameters or throw an exception.
Knowing Tomás - that's probably what he'd really do :)
Erik
On Oct 8, 2013, at 19:34, Tomás Fernández Löbbe tomasflo...@gmail.com wrote:
I don't
On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner br...@bretthoerner.com wrote:
This is my clusterstate.json:
https://gist.github.com/bretthoerner/0098f741f48f9bb51433
And these are my core sizes (note large ones are sorted to the end):
https://gist.github.com/bretthoerner/f5b5e099212194b5dff6
I have a silly question, how do I query a single shard in SolrCloud? When I
hit solr/foo_shard1_replica1/select it always seems to do a full cluster
query.
I can't (easily) do a _route_ query before I know what each have.
On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley ysee...@gmail.com wrote:
Ignore me I forgot about shards= from the wiki.
On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner br...@bretthoerner.comwrote:
I have a silly question, how do I query a single shard in SolrCloud? When
I hit solr/foo_shard1_replica1/select it always seems to do a full cluster
query.
I can't
queue size shouldn't really be too large, the whole point of
the concurrency is to keep from waiting around for the
communication with the server in a single thread. So having
a bunch of stuff backed up in the queue isn't buying you anything
And you can always increase the memory allocated to
On 10/8/2013 6:12 PM, Brett Hoerner wrote:
Ignore me I forgot about shards= from the wiki.
On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner br...@bretthoerner.comwrote:
I have a silly question, how do I query a single shard in SolrCloud? When
I hit solr/foo_shard1_replica1/select it always
DIH works with SolrCloud as far as I understand. But
moving to SolrJ has several advantages:
1 you have more control over our process, beter
ability to debug etc.
2 If you can partition your data up amongst
several clients, you can probably get through your jobs
much faster.
3 You're not
Hmmm, seems like it should. What's our evidence that it isn't working?
Best,
Erick
On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote:
Hey,
I am using solr 4.0 with my own PostFilter implementation which is executed
after the normal solr query is done. This filter has a
Hmmm, that is odd, the glob dynamicField should
pick this up.
Not quite sure what's going on. You an parse the file
via Tika yourself and look at what's in there, it's a relatively
simple SolrJ program, here's a sample:
http://searchhub.org/2012/02/14/indexing-with-solrj/
Best,
Erick
On Tue,
On 10/7/2013 6:02 AM, Dharmendra Jaiswal wrote:
I am using Solr 4.4 version with SolrCloud on Windows machine.
Somehow i am not able to share schema between multiple core.
If you're in SolrCloud mode, then you already *are* sharing your
schema. You are also sharing your configuration. Both
On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey s...@elyograg.org wrote:
There is also the distrib=false parameter that will cause the request to
be handled directly by the core it is sent to rather than being
distributed/balanced by SolrCloud.
Right - this is probably the best option for
On 10/7/2013 12:36 AM, user 01 wrote:
what's the way to warm up filter queries for a category field with 1000
possible values. Would I need to write 1000 lines manually in the
solrconig.xml or what is the format?
Erick has given you awesome advice. Here's something a little bit
different
On 10/7/2013 3:08 PM, Mark wrote:
Some specific questions:
- When working with HttpSolrServer should we keep around instances for ever or
should we create a singleton that can/should be used over and over?
- Is there a way to change the collection after creating the server or do we
need to
Repeated the experiments on local system. Single shard Solrcloud with a
replica. Tried to index 10K docs. All the indexing operation were
redirected to replica Solr node. While the document while getting indexed
on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
docs got
Hi,
I don't seem to be able to find any info on the possibility to get stats on
dynamic fields. stats=truestates.field=xyz_* appears to literally treat
xyz_* as the field name with a star. Is there a way to get stats on
dynamic fields without explicitly listing them in the query?
Thanks!
Li
The attachment did not go through - try using pastebin.com or something.
Are you adding docs with curl one at a time or in bulk per request.
- Mark
On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote:
Repeated the experiments on local system. Single shard Solrcloud with a
Right - update aliases should only map an alias to one collection, but are
perfectly valid.
Read aliases can map to multiple collections or just one.
There is currently only a create alias command and not an update alias command.
I suppose because the impl for create just happened to work for
I'd suggest that each of your source document sections would be a distinct
solr document. All of the sections could have a source document ID field
to tie them together.
Dynamic fields work best when used in moderation. Your use case seems like
an excessive use of dynamic fields.
-- Jack
Pastbin link http://pastebin.com/cnkXhz7A
I am doing a bulk request. I am uploading 100 files, each file having 100
docs.
-Saurabh
On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote:
The attachment did not go through - try using pastebin.com or something.
Are you
74 matches
Mail list logo