Merged segment warmer Solr 4.4

2013-07-29 Thread Manuel Le Normand
Hi, I have a slow storage machine and non sufficient RAM for the whole index to store all the index. This causes the first queries (~5000) to be very slow (they are read from disk and my cpu is most of time in iowait), and after that the readings from the index become very fast and read mainly

Re: processing documents in solr

2013-07-29 Thread Aditya
Hi, The easiest solution would be to have timestamp indexed. Is there any issue in doing re-indexing? If you want to process records in batch then you need a ordered list and a bookmark. You require a field to sort and maintain a counter / last id as bookmark. This is mandatory to solve your

RAM Usage Debugging

2013-07-29 Thread Furkan KAMACI
When I look at my dashboard I see that 27.30 GB available for JVM, 24.77 GB is gray and 16.50 GB is black. I don't do anything on my machine right now. Did it cache documents or is there any problem, how can I learn it?

RE: new field type - enum field

2013-07-29 Thread Elran Dvir
Thanks, Erick. I have tried it four times. It keeps failing. The problem reoccurred today. Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, July 29, 2013 2:44 AM To: solr-user@lucene.apache.org Subject: Re: new field type - enum field You

Re: Two-steps queries with different sorting criteria

2013-07-29 Thread Otis Gospodnetic
Hi, Not sure if this was already answered, but... If the source of the problem are overly general queries, I would try to eliminate or minimize that. For example: * offering query autocomplete functionality can have an affect on query length and precision * showing related searches (derived

.lock file not created when making a backup snapshot

2013-07-29 Thread Artem Karpenko
Hi, when making a backup snapshot using /replication?command=backup call, a snapshot directory is created and starts to be filled, but appropriate .lock file is not created so it's impossible to check when backup is finished. I've taken a look at code and it seems to me that lock.obtain()

AND Queries

2013-07-29 Thread Furkan KAMACI
I am searching for a keyword as like that: lang:en AND url:book pencil cat It returns me results however none of them includes both book, pencil and cat keywords. How should I rewrite my query? I tried this: lang:en AND url:(book AND pencil AND cat) and looks like OK. However this not:

Re: AND Queries

2013-07-29 Thread Rafał Kuć
Hello! Try turning on debugQuery and see what I happening. From what I see you are searching the en term in lang field, the book term in url field and the pencil and cat terms in the default search field, but from your second query I see that you would like to find the last two terms in the url.

Re: .lock file not created when making a backup snapshot

2013-07-29 Thread Mark Triggs
Hi Artem, I noticed this recently too. I created a JIRA issue here: https://issues.apache.org/jira/browse/SOLR-5040 Cheers, Mark Artem Karpenko a.karpe...@oxseed.com writes: Hi, when making a backup snapshot using /replication?command=backup call, a snapshot directory is created and

swap and GC

2013-07-29 Thread Bernd Fehling
Something interesting I have noticed today, after running my huge single index (49 mio. records / 137 GB index) for about a week and replicating today I recognized that the heap usage after replication did not go down as expected. Expected means if solr is started I have a heap size between 4 to 5

Re: AND Queries

2013-07-29 Thread Furkan KAMACI
When I send that query: select?pf=url^10+title^8fl=url,content,titlestart=0q=lang:en+AND+(cat+AND+dog+AND+pencil)qf=content^5+url^8.0+title^6wt=xmldebugQuery=on It is debugged as: +(+lang:en +(+(content:cat^5.0 | title:cat^6.0 | url:cat^8.0) +(content:dog^5.0 | title:dog^6.0 | url:dog^8.0)

Re: AND Queries

2013-07-29 Thread fbrisbart
Because you specified the search fields to use with 'qf' which overrides the default search field. Franck Brisbart Le lundi 29 juillet 2013 à 13:01 +0300, Furkan KAMACI a écrit : When I send that query:

solr query range upper exclusive

2013-07-29 Thread alin1918
q=price_1_1:[197 TO 249] and q=*:*fq=price_1_1:[197 TO 249] returns 2 records but I have two records with the price_1_1 = 249, it seams that the upper range is exclusive and I can't figure out why, can you help me? dynamicField name=price_*type=tfloat indexed=true/ fieldType

Re: processing documents in solr

2013-07-29 Thread Erick Erickson
No SolrJ doesn't provide this automatically. You'd be providing the counter by inserting it into the document as you created new docs. You could do this with any kind of document creation you are using. Best Erick On Mon, Jul 29, 2013 at 2:51 AM, Aditya findbestopensou...@gmail.com wrote: Hi,

Re: new field type - enum field

2013-07-29 Thread Erick Erickson
OK, if you can attach it to an e-mail, I'll attach it. Just to check, though, make sure you're logged in. I've been fooled once or twice by being automatically signed out... Erick On Mon, Jul 29, 2013 at 3:17 AM, Elran Dvir elr...@checkpoint.com wrote: Thanks, Erick. I have tried it four

Re: Performance vs. maxBufferedAddsPerServer=10

2013-07-29 Thread Mark Miller
SOLR-4816 won't address this - it will just speed up *different* parts. There are other things that will need to be done to speed up that part. - Mark On Jul 26, 2013, at 3:53 PM, Erick Erickson erickerick...@gmail.com wrote: This is current a hard-coded limit from what I've understood. From

DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Santanu8939967892
Hi, I have a huge volume of DB records, which is close to 250 millions. I am going to use DIH to index the data into Solr. I need a best architecture to index and query the data in an efficient manner. I am using windows server 2008 with 16 GB RAM, zion processor and Solr 4.4. With Regards,

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Gora Mohanty
On 29 July 2013 17:30, Santanu8939967892 mishra.sant...@gmail.com wrote: Hi, I have a huge volume of DB records, which is close to 250 millions. I am going to use DIH to index the data into Solr. I need a best architecture to index and query the data in an efficient manner. [...] This is

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Jack Krupansky
The initial question is not how to index the data, but how you want to use or query the data. Use cases for query and data access should drive the data model that you will use to index the data. So, what are some sample queries? How will users want to search and access the data? What data

Re: .lock file not created when making a backup snapshot

2013-07-29 Thread Artem Karpenko
Thanks Mark! 29.07.2013 12:32, Mark Triggs пишет: Hi Artem, I noticed this recently too. I created a JIRA issue here: https://issues.apache.org/jira/browse/SOLR-5040 Cheers, Mark Artem Karpenko a.karpe...@oxseed.com writes: Hi, when making a backup snapshot using

Re: solr query range upper exclusive

2013-07-29 Thread Jack Krupansky
Square brackets are inclusive and curly braces are exclusive for range queries. I tried a similar example with the standard Solr example and it works fine: curl http://localhost:8983/solr/update?commit=true; \ -H 'Content-type:application/json' -d ' [{id: doc-1, price_f: 249}]' curl

RE: swap and GC

2013-07-29 Thread Michael Ryan
This is interesting... How are you measuring the heap size? -Michael -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: Monday, July 29, 2013 5:34 AM To: solr-user@lucene.apache.org Subject: swap and GC Something interesting I have noticed today, after

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Santanu8939967892
Hi Jack, My sample query will be with a keyword (text) and probably 2 to 3 filters. There is a java interface for display of data, which will consume a class, and the class returns a data set object using SolrJ. So for display we will use a list for binding. we may display 20 or 30 meta data

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Jack Krupansky
You neglected to provide information about the filters or the 20 or 30 meta data information. Did you mean to imply that you will not be querying against the metadata (only returning it)? -- Jack Krupansky -Original Message- From: Santanu8939967892 Sent: Monday, July 29, 2013 9:41

Re: solr query range upper exclusive

2013-07-29 Thread alin1918
what query parser should I use? http://wiki.apache.org/solr/SolrQuerySyntax Differences From Lucene Query Parser Differences in the Solr Query Parser include Range queries [a TO z], prefix queries a*, and wildcard queries a*b are constant-scoring (all matching documents get an equal

restricting a query by a set of field values

2013-07-29 Thread Benjamin Ryan
Hi, Is it possible to construct a query in SOLR to perform a query that is restricted to only those documents that have a field value in a particular set of values similar to what would be done in POstgres with the SQL query: SELECT date_deposited FROM stats

The meaning of the of the doc= on the debugQuery output

2013-07-29 Thread Bruno René Santos
Hello One line on my debugQuery of a query is 2.1706323e-6 = score(doc=49578,freq=1.0 = termfreq=1.0), product of: I wanted to know what the doc= means. It seems to be something used on the fieldWeight but on the other hand it is the same for all fields on the document, regardless of the query

Re: restricting a query by a set of field values

2013-07-29 Thread Jason Hellman
Ben, This could be constructed as so: fl=date_depositedfq=date[2013-07-01T00:00:00Z TO 2013-07-31T23:59:00Z]fq=collection_id(1 2 n)q.op=OR The parenthesis around the 1 2 n set indicate a boolean query, and we're ensuring they are an OR boolean by the q.op parameter. This should get you the

SolrCloud and Joins

2013-07-29 Thread David Larochelle
I'm setting up SolrCloud with around 600 million documents. The basic structure of each document is: stories_id: integer, media_id: integer, sentence: text_en We have a number of stories from different media and we treat each sentence as a separate document because we need to run sentence level

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Shawn Heisey
On 7/29/2013 6:00 AM, Santanu8939967892 wrote: Hi, I have a huge volume of DB records, which is close to 250 millions. I am going to use DIH to index the data into Solr. I need a best architecture to index and query the data in an efficient manner. I am using windows server 2008 with 16

Re: The meaning of the of the doc= on the debugQuery output

2013-07-29 Thread fbrisbart
Hi, doc is the internal docId of the index. Each doc in the index has an internal id. It starts from 1 (1st doc inserted in the index), 2 for the 2nd, ... Franck Brisbart Le lundi 29 juillet 2013 à 15:34 +0100, Bruno René Santos a écrit : Hello One line on my debugQuery of a query is

Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Nitin Agarwal
Hi, I am using Solr 4.3.1 with 2 Shards and replication factor of 1, running on apache tomcat 7.0.42 with external zookeeper 3.4.5. When I query select?q=*:* I only get the number of documents found, but no actual document. When I query with rows=0, I do get correct count of documents in the

Solr Out Of Memory with Field Collapsing

2013-07-29 Thread tushar_k47
Hi, We are using Field collapsing feature with multiple shards. We ran into into Out of Memory errors on one of the shards. We use filed collapsing on a particular field which has only one specific value on the shard that goes out of memory. Interestingly the Out of Memory error recurred multiple

Re: SolrCloud and Joins

2013-07-29 Thread Walter Underwood
Denormalize. Add media_set_id to each sentence document. Done. wunder On Jul 29, 2013, at 7:58 AM, David Larochelle wrote: I'm setting up SolrCloud with around 600 million documents. The basic structure of each document is: stories_id: integer, media_id: integer, sentence: text_en We

solr - set fileds as default search field

2013-07-29 Thread Mysurf Mail
The following query works well for me http://[]:8983/solr/vault/select?q=VersionComments%3AWhite returns all the documents where version comments includes White I try to omit the field name and put it as a default value as follows : In solr config I write requestHandler name=/select

Re: solr - set fileds as default search field

2013-07-29 Thread Ahmet Arslan
Hi, df is a single valued parameter. Only one field can be a default field. To query multiple fields use (e)dismax query parser :  http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29 From: Mysurf Mail stammail...@gmail.com To:

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Jason Hellman
Nitin, You need to ensure the fields you wish to see are marked stored=true in your schema.xml file, and you should include fields in your fl= parameter (fl=*,score is a good place to start). Jason On Jul 29, 2013, at 8:08 AM, Nitin Agarwal 2nitinagar...@gmail.com wrote: Hi, I am using Solr

Re: solr - set fileds as default search field

2013-07-29 Thread Jason Hellman
Or use the copyField technique to a single searchable field and set df= to that field. The example schema does this with the field called text. On Jul 29, 2013, at 8:35 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, df is a single valued parameter. Only one field can be a default field.

Re: SolrCloud and Joins

2013-07-29 Thread David Larochelle
We'd like to be able to easily update the media set to source mapping. I'm concerned that if we store the media_sets_id in the sentence documents, it will be very difficult to add additional media set to source mapping. I imagine that adding a new media set would either require reimporting all 600

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Nitin Agarwal
Jason, all my fields are set with stored=ture and indexed = true, and I used select?q=*:*fl=*,score but still I get the same response *response lst name=responseHeader int name=status0/int int name=QTime138/int lst name=params str name=fl*,score/str str

Re: restricting a query by a set of field values

2013-07-29 Thread Chris Hostetter
: fl=date_depositedfq=date[2013-07-01T00:00:00Z TO 2013-07-31T23:59:00Z]fq=collection_id(1 2 n)q.op=OR typo -- the colon is missing... fq=collection_id:(1 2 n) if you don't want the q.op to apply globally to your request, you can also scope it only for that filter. likewise the field_name:

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Jack Krupansky
Check the /select request handler in solrconfig. See if it defaults start or rows. start is the initial document number (e.g., 1), and rows is the number of rows to actually return in the response (nothing to do with numFound). The internal Solr default is rows=10, but you can set it to 20,

Re: processing documents in solr

2013-07-29 Thread Joe Zhang
I'll try reindexing the timestamp. The id-creation approach suggested by Erick sounds attractive, but the nutch/solr integration seems rather tight. I don't where to break in to insert the id into solr. On Mon, Jul 29, 2013 at 4:11 AM, Erick Erickson erickerick...@gmail.comwrote: No SolrJ

Re: SolrCloud and Joins

2013-07-29 Thread Walter Underwood
A join may seem clean, but it will be slow and (currently) doesn't work in a cluster. You find all the sentences in a media set by searching for that set id and requesting only the sentence_id (yes, you need that). Then you reindex them. With small documents like this, it is probably fairly

Re: SolrCloud shard down

2013-07-29 Thread Katie McCorkell
I am using Solr 4.3.1 . I did hard commit after indexing. I think you're right that the node was still recovering. I didn't think so since it didn't show up as yellow recovering on the visual display, but after quite a while it went from Down to Active . Thanks! On Fri, Jul 26, 2013 at 7:59 PM,

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Chris Hostetter
: Here is what my schema looks like what is your uniqueKey field? I'm going to bet it's tn_lookup_key_id and i'm going to bet your lowercase fieldType has an interesting analyzer on it. you are probably hitting a situation where the analyzer you have on your uniqueKey field is munging the

Re: SolrCloud shard down

2013-07-29 Thread Mark Miller
On Jul 29, 2013, at 12:49 PM, Katie McCorkell katiemccork...@gmail.com wrote: I didn't think so since it didn't show up as yellow recovering on the visual display, but after quite a while it went from Down to Active . Thanks! Thanks, I think we should improve this! We should publish a

Re: Performance vs. maxBufferedAddsPerServer=10

2013-07-29 Thread Erick Erickson
Why wouldn't it? Or are you saying that the routing to replicas from the leader also 10/packet? Hmmm, hadn't thought of that... On Mon, Jul 29, 2013 at 7:58 AM, Mark Miller markrmil...@gmail.com wrote: SOLR-4816 won't address this - it will just speed up *different* parts. There are other

Re: DIH to index the data - 250 millions - Need a best architecture

2013-07-29 Thread Mikhail Khludnev
Mishra, What if you setup DIH with single SQLEntityProcessor without caching, does it works for you? On Mon, Jul 29, 2013 at 4:00 PM, Santanu8939967892 mishra.sant...@gmail.com wrote: Hi, I have a huge volume of DB records, which is close to 250 millions. I am going to use DIH to index

Pentaho Kettle vs DIH

2013-07-29 Thread Mikhail Khludnev
Hello, Don't you have any experience with using Pentaho Kettle for processing RDBMS and pouring them into Solr? Isn't it some sort of replacement of the DIH? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Nitin Agarwal
Erick, I had typed tn_lookup_key_id as lowercase and it was defined as fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Nitin

solr sizing

2013-07-29 Thread Torsten Albrecht
Hi all, we have - 70 mio documents to 100 mio documents and we want - 800 requests per second How many servers Amazon EC2/real hardware we Need for this? Solr 4.x with solr cloud or better shards with loadbalancer? Is anyone here who can give me some information, or who operates a similar

Re: Merged segment warmer Solr 4.4

2013-07-29 Thread Chris Hostetter
: I have a slow storage machine and non sufficient RAM for the whole index to : store all the index. This causes the first queries (~5000) to be very slow ... : Secondly I thought of initiating a new searcher event listener that queries : on docs that were inserted since the last hard

Re: solr sizing

2013-07-29 Thread Shawn Heisey
On 7/29/2013 2:18 PM, Torsten Albrecht wrote: we have - 70 mio documents to 100 mio documents and we want - 800 requests per second How many servers Amazon EC2/real hardware we Need for this? Solr 4.x with solr cloud or better shards with loadbalancer? Is anyone here who can give me some

SOLR replication question?

2013-07-29 Thread SolrLover
I am currently using SOLR 4.4. but not planning to use solrcloud in very near future. I have 3 master / 3 slave setup. Each master is linked to its corresponding slave.. I have disabled auto polling.. We do both push (using MQ) and pull indexing using SOLRJ indexing program. I have enabled

Solr Cloud - How to balance Batch and Queue indexing?

2013-07-29 Thread SolrLover
I need some advice on the best way to implement Batch indexing with soft commit / Push indexing (via queue) with soft commit when using SolrCloud. *I am trying to figure out a way to: * 1. Make the push indexing available almost real time (using soft commit) without degrading the search /

Re: SOLR replication question?

2013-07-29 Thread Shawn Heisey
I am currently using SOLR 4.4. but not planning to use solrcloud in very near future. I have 3 master / 3 slave setup. Each master is linked to its corresponding slave.. I have disabled auto polling.. We do both push (using MQ) and pull indexing using SOLRJ indexing program. I have enabled

Re: Streaming Updates Using HttpSolrServer.add(Iterator) In Solr 4.3

2013-07-29 Thread Shawn Heisey
I am indexing more than 300 million records, it takes less than 7 hours to index all the records.. Send the documents in batches and also use CUSS (ConcurrentUpdateSolrServer) for multi threading support. Ex: ConcurrentUpdateSolrServer server= new ConcurrentUpdateSolrServer(solrServer,

Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ? Bill Bell Sent from mobile On Jul 29, 2013, at 4:25 PM, Erick Erickson erickerick...@gmail.com wrote: This is very strange. I'd expect slow queries on the first few queries while these caches were warmed, but after that I'd expect

Re: Performance vs. maxBufferedAddsPerServer=10

2013-07-29 Thread Mark Miller
Yes, the internal document forwarding path is different and does not use the CloudSolrServer. It currently works with a buffer of 10. - Mark On Jul 29, 2013, at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote: Why wouldn't it? Or are you saying that the routing to replicas from the