Hi,
I have a slow storage machine and non sufficient RAM for the whole index to
store all the index. This causes the first queries (~5000) to be very slow
(they are read from disk and my cpu is most of time in iowait), and after
that the readings from the index become very fast and read mainly
Hi,
The easiest solution would be to have timestamp indexed. Is there any issue
in doing re-indexing?
If you want to process records in batch then you need a ordered list and a
bookmark. You require a field to sort and maintain a counter / last id as
bookmark. This is mandatory to solve your
When I look at my dashboard I see that 27.30 GB available for JVM, 24.77
GB is gray and 16.50 GB is black. I don't do anything on my machine right
now. Did it cache documents or is there any problem, how can I learn it?
Thanks, Erick.
I have tried it four times. It keeps failing.
The problem reoccurred today.
Thanks.
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Monday, July 29, 2013 2:44 AM
To: solr-user@lucene.apache.org
Subject: Re: new field type - enum field
You
Hi,
Not sure if this was already answered, but...
If the source of the problem are overly general queries, I would try to
eliminate or minimize that. For example:
* offering query autocomplete functionality can have an affect on query
length and precision
* showing related searches (derived
Hi,
when making a backup snapshot using /replication?command=backup call,
a snapshot directory is created and starts to be filled, but appropriate
.lock file is not created so it's impossible to check when backup is
finished. I've taken a look at code and it seems to me that
lock.obtain()
I am searching for a keyword as like that:
lang:en AND url:book pencil cat
It returns me results however none of them includes both book, pencil and
cat keywords. How should I rewrite my query?
I tried this:
lang:en AND url:(book AND pencil AND cat)
and looks like OK. However this not:
Hello!
Try turning on debugQuery and see what I happening. From what I see
you are searching the en term in lang field, the book term in url
field and the pencil and cat terms in the default search field, but
from your second query I see that you would like to find the last two
terms in the url.
Hi Artem,
I noticed this recently too. I created a JIRA issue here:
https://issues.apache.org/jira/browse/SOLR-5040
Cheers,
Mark
Artem Karpenko a.karpe...@oxseed.com writes:
Hi,
when making a backup snapshot using /replication?command=backup
call, a snapshot directory is created and
Something interesting I have noticed today,
after running my huge single index (49 mio. records / 137 GB index) for
about a week and replicating today I recognized that the heap usage after
replication did not go down as expected. Expected means if solr is started
I have a heap size between 4 to 5
When I send that query:
select?pf=url^10+title^8fl=url,content,titlestart=0q=lang:en+AND+(cat+AND+dog+AND+pencil)qf=content^5+url^8.0+title^6wt=xmldebugQuery=on
It is debugged as:
+(+lang:en +(+(content:cat^5.0 | title:cat^6.0 | url:cat^8.0)
+(content:dog^5.0 | title:dog^6.0 | url:dog^8.0)
Because you specified the search fields to use with 'qf' which overrides
the default search field.
Franck Brisbart
Le lundi 29 juillet 2013 à 13:01 +0300, Furkan KAMACI a écrit :
When I send that query:
q=price_1_1:[197 TO 249] and q=*:*fq=price_1_1:[197 TO 249] returns 2
records
but I have two records with the price_1_1 = 249, it seams that the upper
range is exclusive and I can't figure out why, can you help me?
dynamicField name=price_*type=tfloat indexed=true/
fieldType
No SolrJ doesn't provide this automatically. You'd be providing the
counter by inserting it into the document as you created new docs.
You could do this with any kind of document creation you are
using.
Best
Erick
On Mon, Jul 29, 2013 at 2:51 AM, Aditya findbestopensou...@gmail.com wrote:
Hi,
OK, if you can attach it to an e-mail, I'll attach it.
Just to check, though, make sure you're logged in. I've been fooled
once or twice by being automatically signed out...
Erick
On Mon, Jul 29, 2013 at 3:17 AM, Elran Dvir elr...@checkpoint.com wrote:
Thanks, Erick.
I have tried it four
SOLR-4816 won't address this - it will just speed up *different* parts. There
are other things that will need to be done to speed up that part.
- Mark
On Jul 26, 2013, at 3:53 PM, Erick Erickson erickerick...@gmail.com wrote:
This is current a hard-coded limit from what I've understood. From
Hi,
I have a huge volume of DB records, which is close to 250 millions.
I am going to use DIH to index the data into Solr.
I need a best architecture to index and query the data in an efficient
manner.
I am using windows server 2008 with 16 GB RAM, zion processor and Solr 4.4.
With Regards,
On 29 July 2013 17:30, Santanu8939967892 mishra.sant...@gmail.com wrote:
Hi,
I have a huge volume of DB records, which is close to 250 millions.
I am going to use DIH to index the data into Solr.
I need a best architecture to index and query the data in an efficient
manner.
[...]
This is
The initial question is not how to index the data, but how you want to use
or query the data. Use cases for query and data access should drive the data
model that you will use to index the data.
So, what are some sample queries? How will users want to search and access
the data? What data
Thanks Mark!
29.07.2013 12:32, Mark Triggs пишет:
Hi Artem,
I noticed this recently too. I created a JIRA issue here:
https://issues.apache.org/jira/browse/SOLR-5040
Cheers,
Mark
Artem Karpenko a.karpe...@oxseed.com writes:
Hi,
when making a backup snapshot using
Square brackets are inclusive and curly braces are exclusive for range
queries.
I tried a similar example with the standard Solr example and it works fine:
curl http://localhost:8983/solr/update?commit=true; \
-H 'Content-type:application/json' -d '
[{id: doc-1, price_f: 249}]'
curl
This is interesting... How are you measuring the heap size?
-Michael
-Original Message-
From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
Sent: Monday, July 29, 2013 5:34 AM
To: solr-user@lucene.apache.org
Subject: swap and GC
Something interesting I have noticed today, after
Hi Jack,
My sample query will be with a keyword (text) and probably 2 to 3
filters.
There is a java interface for display of data, which will consume a class,
and the class returns a data set object using SolrJ.
So for display we will use a list for binding. we may display 20 or 30 meta
data
You neglected to provide information about the filters or the 20 or 30 meta
data information.
Did you mean to imply that you will not be querying against the metadata
(only returning it)?
-- Jack Krupansky
-Original Message-
From: Santanu8939967892
Sent: Monday, July 29, 2013 9:41
what query parser should I use? http://wiki.apache.org/solr/SolrQuerySyntax
Differences From Lucene Query Parser
Differences in the Solr Query Parser include
Range queries [a TO z], prefix queries a*, and wildcard queries a*b are
constant-scoring (all matching documents get an equal
Hi,
Is it possible to construct a query in SOLR to perform a query
that is restricted to only those documents that have a field value in a
particular set of values similar to what would be done in POstgres with the SQL
query:
SELECT date_deposited FROM stats
Hello
One line on my debugQuery of a query is
2.1706323e-6 = score(doc=49578,freq=1.0 = termfreq=1.0), product of:
I wanted to know what the doc= means. It seems to be something used on the
fieldWeight but on the other hand it is the same for all fields on the
document, regardless of the query
Ben,
This could be constructed as so:
fl=date_depositedfq=date[2013-07-01T00:00:00Z TO
2013-07-31T23:59:00Z]fq=collection_id(1 2 n)q.op=OR
The parenthesis around the 1 2 n set indicate a boolean query, and we're
ensuring they are an OR boolean by the q.op parameter.
This should get you the
I'm setting up SolrCloud with around 600 million documents. The basic
structure of each document is:
stories_id: integer, media_id: integer, sentence: text_en
We have a number of stories from different media and we treat each sentence
as a separate document because we need to run sentence level
On 7/29/2013 6:00 AM, Santanu8939967892 wrote:
Hi,
I have a huge volume of DB records, which is close to 250 millions.
I am going to use DIH to index the data into Solr.
I need a best architecture to index and query the data in an efficient
manner.
I am using windows server 2008 with 16
Hi,
doc is the internal docId of the index.
Each doc in the index has an internal id. It starts from 1 (1st doc
inserted in the index), 2 for the 2nd, ...
Franck Brisbart
Le lundi 29 juillet 2013 à 15:34 +0100, Bruno René Santos a écrit :
Hello
One line on my debugQuery of a query is
Hi, I am using Solr 4.3.1 with 2 Shards and replication factor of 1,
running on apache tomcat 7.0.42 with external zookeeper 3.4.5.
When I query select?q=*:*
I only get the number of documents found, but no actual document. When I
query with rows=0, I do get correct count of documents in the
Hi,
We are using Field collapsing feature with multiple shards. We ran into into
Out of Memory errors on one of the shards. We use filed collapsing on a
particular field which has only one specific value on the shard that goes
out of memory. Interestingly the Out of Memory error recurred multiple
Denormalize. Add media_set_id to each sentence document. Done.
wunder
On Jul 29, 2013, at 7:58 AM, David Larochelle wrote:
I'm setting up SolrCloud with around 600 million documents. The basic
structure of each document is:
stories_id: integer, media_id: integer, sentence: text_en
We
The following query works well for me
http://[]:8983/solr/vault/select?q=VersionComments%3AWhite
returns all the documents where version comments includes White
I try to omit the field name and put it as a default value as follows : In
solr config I write
requestHandler name=/select
Hi,
df is a single valued parameter. Only one field can be a default field.
To query multiple fields use (e)dismax query parser :
http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
From: Mysurf Mail stammail...@gmail.com
To:
Nitin,
You need to ensure the fields you wish to see are marked stored=true in your
schema.xml file, and you should include fields in your fl= parameter
(fl=*,score is a good place to start).
Jason
On Jul 29, 2013, at 8:08 AM, Nitin Agarwal 2nitinagar...@gmail.com wrote:
Hi, I am using Solr
Or use the copyField technique to a single searchable field and set df= to that
field. The example schema does this with the field called text.
On Jul 29, 2013, at 8:35 AM, Ahmet Arslan iori...@yahoo.com wrote:
Hi,
df is a single valued parameter. Only one field can be a default field.
We'd like to be able to easily update the media set to source mapping. I'm
concerned that if we store the media_sets_id in the sentence documents, it
will be very difficult to add additional media set to source mapping. I
imagine that adding a new media set would either require reimporting all
600
Jason, all my fields are set with stored=ture and indexed = true, and I used
select?q=*:*fl=*,score
but still I get the same response
*response
lst name=responseHeader
int name=status0/int
int name=QTime138/int
lst name=params
str name=fl*,score/str
str
: fl=date_depositedfq=date[2013-07-01T00:00:00Z TO
2013-07-31T23:59:00Z]fq=collection_id(1 2 n)q.op=OR
typo -- the colon is missing...
fq=collection_id:(1 2 n)
if you don't want the q.op to apply globally to your request, you can also
scope it only for that filter. likewise the field_name:
Check the /select request handler in solrconfig. See if it defaults
start or rows. start is the initial document number (e.g., 1), and rows
is the number of rows to actually return in the response (nothing to do with
numFound). The internal Solr default is rows=10, but you can set it to 20,
I'll try reindexing the timestamp.
The id-creation approach suggested by Erick sounds attractive, but the
nutch/solr integration seems rather tight. I don't where to break in to
insert the id into solr.
On Mon, Jul 29, 2013 at 4:11 AM, Erick Erickson erickerick...@gmail.comwrote:
No SolrJ
A join may seem clean, but it will be slow and (currently) doesn't work in a
cluster.
You find all the sentences in a media set by searching for that set id and
requesting only the sentence_id (yes, you need that). Then you reindex them.
With small documents like this, it is probably fairly
I am using Solr 4.3.1 . I did hard commit after indexing.
I think you're right that the node was still recovering. I didn't think so
since it didn't show up as yellow recovering on the visual display, but
after quite a while it went from Down to Active . Thanks!
On Fri, Jul 26, 2013 at 7:59 PM,
: Here is what my schema looks like
what is your uniqueKey field?
I'm going to bet it's tn_lookup_key_id and i'm going to bet your
lowercase fieldType has an interesting analyzer on it.
you are probably hitting a situation where the analyzer you have on your
uniqueKey field is munging the
On Jul 29, 2013, at 12:49 PM, Katie McCorkell katiemccork...@gmail.com wrote:
I didn't think so
since it didn't show up as yellow recovering on the visual display, but
after quite a while it went from Down to Active . Thanks!
Thanks, I think we should improve this! We should publish a
Why wouldn't it? Or are you saying that the routing to replicas
from the leader also 10/packet? Hmmm, hadn't thought of that...
On Mon, Jul 29, 2013 at 7:58 AM, Mark Miller markrmil...@gmail.com wrote:
SOLR-4816 won't address this - it will just speed up *different* parts. There
are other
Mishra,
What if you setup DIH with single SQLEntityProcessor without caching, does
it works for you?
On Mon, Jul 29, 2013 at 4:00 PM, Santanu8939967892 mishra.sant...@gmail.com
wrote:
Hi,
I have a huge volume of DB records, which is close to 250 millions.
I am going to use DIH to index
Hello,
Don't you have any experience with using Pentaho Kettle for processing
RDBMS and pouring them into Solr? Isn't it some sort of replacement of the
DIH?
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
http://www.griddynamics.com
mkhlud...@griddynamics.com
Erick, I had typed tn_lookup_key_id as lowercase and it was defined as
fieldType name=lowercase class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType
Nitin
Hi all,
we have
- 70 mio documents to 100 mio documents
and we want
- 800 requests per second
How many servers Amazon EC2/real hardware we Need for this?
Solr 4.x with solr cloud or better shards with loadbalancer?
Is anyone here who can give me some information, or who operates a similar
: I have a slow storage machine and non sufficient RAM for the whole index to
: store all the index. This causes the first queries (~5000) to be very slow
...
: Secondly I thought of initiating a new searcher event listener that queries
: on docs that were inserted since the last hard
On 7/29/2013 2:18 PM, Torsten Albrecht wrote:
we have
- 70 mio documents to 100 mio documents
and we want
- 800 requests per second
How many servers Amazon EC2/real hardware we Need for this?
Solr 4.x with solr cloud or better shards with loadbalancer?
Is anyone here who can give me some
I am currently using SOLR 4.4. but not planning to use solrcloud in very near
future.
I have 3 master / 3 slave setup. Each master is linked to its corresponding
slave.. I have disabled auto polling..
We do both push (using MQ) and pull indexing using SOLRJ indexing program.
I have enabled
I need some advice on the best way to implement Batch indexing with soft
commit / Push indexing (via queue) with soft commit when using SolrCloud.
*I am trying to figure out a way to:
*
1. Make the push indexing available almost real time (using soft commit)
without degrading the search /
I am currently using SOLR 4.4. but not planning to use solrcloud in very
near
future.
I have 3 master / 3 slave setup. Each master is linked to its
corresponding
slave.. I have disabled auto polling..
We do both push (using MQ) and pull indexing using SOLRJ indexing
program.
I have enabled
I am indexing more than 300 million records, it takes less than 7 hours to
index all the records..
Send the documents in batches and also use CUSS
(ConcurrentUpdateSolrServer)
for multi threading support.
Ex:
ConcurrentUpdateSolrServer server= new
ConcurrentUpdateSolrServer(solrServer,
Can you compare with the old geo handler as a baseline. ?
Bill Bell
Sent from mobile
On Jul 29, 2013, at 4:25 PM, Erick Erickson erickerick...@gmail.com wrote:
This is very strange. I'd expect slow queries on
the first few queries while these caches were
warmed, but after that I'd expect
Yes, the internal document forwarding path is different and does not use the
CloudSolrServer. It currently works with a buffer of 10.
- Mark
On Jul 29, 2013, at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote:
Why wouldn't it? Or are you saying that the routing to replicas
from the
60 matches
Mail list logo