date:20130924



(NOTE: cross-posted to various lists, please reply only to general@lucene 
w/ any questions or follow ups)


Hey folks,

2 announcements regarding the upcoming Lucene/Solr Revolution EU 2013 in 
Dublin (November 4-7)...


## 1) Session List Now Posted

I'd like to thank everyone who helped vote for the sessions that 
interested you during the community voting period.  The bulk of the 
sessions that were selected, and will be presented, are now listed online 
-- a few more will be added once we get final confirmation from the 
remaining speakers who were selected...


  http://lucenerevolution.org/sessions

## 2) Early Bird Pricing Ends Soon

"Early bird" discount registration pricing is available until Monday, 
September 30th -- after that, the registration cost will increase by $100 
USD.  So if you are planning to go, you should register soon and save some 
money...


  http://lucenerevolution.org/registration


Additional details about the conference can be found at the website, or 
feel free to reply to this email with any questions...


  http://lucenerevolution.org


-Hoss

dih HTMLStripTransformer

2013-09-24 Thread Andreas Owen

why does stripHTML="false" have no effect in dih? the html is strippedin text 
and text_nohtml when i do display the index with select?q=*

i'm trying to get a field without html and one with it so i can also index the 
links on the page.

data-config.xml

Re: Get only those documents that are fully satisfied.

2013-09-24 Thread asuka

Thanks Chris,
that's exactly what I was looking for.

One last question. As far as I can see, the solution that you are offering
me, termfreq is for Solr 4+, isn't it?

Right now I'm working with Solr 3.6.2. Is there any solution for such
version or do I need an upgrade?

Kind Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-24 Thread JMill

Hi,

I'm using Solr's Suggester function to implement an autocomplete feature.
I have it setup to check against the "username" and "name" fields.  Problem
is when running  a query against the name, the second term, after
whitespace (surename) returns 0 results.  Works if if query is a partial
name starting from the begining e.g. Given the name "Bill Rogers", a query
for Rogers will return 0 results whereas a query for "Bill" will return
positive (Bill Rogers). As for the username, it's not working at.

I am after the following behaviour.

Match any partial words in the fields "username" or "name" and return the
results.  If there is match in the field "name" the return the whole name
e.g. given the queries "Rogers" or "Bill"" return "Bill Rogers (not the
single word that was a match)".

schema.xml extract
..

 

...


...


 
   
   
   
 



solrconfig.xml



   suggest
   org.apache.solr.spelling.suggest.Suggester
   org.apache.solr.spelling.suggest.tst.TSTLookup
   autocomplete  
   0.005
   true
   




..

  
true
suggest
true
5
true
  
  
 spellcheck

Re: How can i search maximum number of word in particular docs

: &q=pookan
...
: Acutually i want particular word for that match max in content tag that
: come first (relevancy based)

the default TF-IDF scoring mechanism rewards documents for matching a term 
multiple times (thats the the "TF" part) but there is also a length 
normalization factor that comes into play -- the idea being that a a very 
short document -- maybe only a paragraph -- containing 2 instances of 
"pookan" is probably more relevant than a longer document containing 3 
instances of the same term, if that longer document is several thouand 
paragraphs.

you can disable this length normalization by using "omitNorms"...

https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties


-Hoss

SOLR grouped query sorting on numFound

2013-09-24 Thread Brent Ryan

We ran into 1 snag during development with SOLR and I thought I'd run it by
anyone to see if they had any slick ways to solve this issue.

Basically, we're performing a SOLR query with grouping and want to be able
to sort by the number of documents found within each group.

Our query response from SOLR looks something like this:

{

  "responseHeader":{

"status":0,

"QTime":17,

"params":{

  "indent":"true",

  "q":"*:*",

  "group.limit":"0",

  "group.field":"rfp_stub",

  "group":"true",

  "wt":"json",

  "rows":"1000"}},

  "grouped":{

"rfp_stub":{

  "matches":18470,

  "groups":[{


"groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e",

  "doclist":{"*numFound*":3,"start":0,"docs":[]

  }},

{


"groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce",

  "doclist":{"*numFound*":5,"start":0,"docs":[]

  }},

{


"groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131",

  "doclist":{"*numFound*":6,"start":0,"docs":[]

  }},

…


The *numFound* shows the number of documents within that group.  Is there
anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
supported, but wondered if anyone their has come across this and if there
was any suggested workarounds given that the dataset is really too large to
hold in memory on our app servers?

Re: Get only those documents that are fully satisfied.


: Your requirement is still somewhat ambiguous - you use "fully" and "some" in
: the same sentence. Which is it?

the request seems pretty clear to me...

:   I don't want to get documents that fit my whole query, I want those
: documents that are fully satisfied  with some terms of the query.

...my reading is:

 * given a set of documents each containing an arbitrary number of 
"doc_terms" in "field_f"
 * given a query "q" containing an arbitrary number of "q_terms"
 * find all documents where every "doc_term" in that document's "field_f" 
exists in the query as a "q_term"

ie: all terms of the document must exist in the query for the doc to 
match, but not all terms from the query must exist in a document.

There is no trivial out of the box solution at the moment, but there is a 
solution possible using function queries as described in 
this email...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201308.mbox/%3Calpine.DEB.2.02.1308091122150.2685@frisbee%3E

Repeating the key bits below...

-Hoss


...

1) if you don't care about using non-trivial analysis (ie: you don't need 
stemming, or synonyms, etc..), you can do this with some really simple 
function queries -- asusming you index a field containing hte number of 
"words" in each document, in addition to the words themselves.  Assuming 
your words are in a field named "words" and the number of words is in a 
field named "words_count" a request for something like "Galaxy Samsung S4" 
can be represented as...

  q={!frange l=0 u=0}sub(words_count,
 sum(termfreq('words','Galaxy'),
 termfreq('words','Samsung'),
 termfreq('words','S4'))

...ie: you want to compute the sub of the term frequencies for each of 
hte words requested, and then you want ot subtract that sum from the 
number of terms in the documengt -- and then you only want ot match 
documents where the result of that subtraction is 0.

one complexity that comes up, is that you haven't specified:
  
  * can the list of words in your documents contain duplicates?
  * can the list of words in your query contain duplicates?
  * should a document with duplicatewords match only if the query also 
contains the same word duplicated?

...the answers to those questions make hte math more complicated (and are 
left as an excersize for the reader)


2) if you *do* care about using non-trivial analysis, then you can't use 
the simple "termfreq()" function, which deals with raw terms -- in stead 
you have to use the "query()" function to ensure that the input is parsed 
appropriately -- but then you have to wrap that function in something that 
will normalize the scores - so in place of termfreq('words','Galaxy') 
you'd want something like...

if(query({!field f=words v='Galaxy'}),1,0)

...but again the math gets much harder if you make things more complex 
with duplicate words i nthe document or duplicate words in the query -- 
you'd probably have to use a custom similarity to get the scores returned 
by the query() function to be usable as is in the match equation (and drop 
the "if()" function)

Re: Get only those documents that are fully satisfied.

2013-09-24 Thread Jack Krupansky

Your requirement is still somewhat ambiguous - you use "fully" and "some" in 
the same sentence. Which is it?


If you simply want documents that contain every one of the query terms, 
using the explicit AND operator ("+" or "AND") or set the implicit operator 
to "AND".


But... we are still in the dark as to your precise requirement.

-- Jack Krupansky

-Original Message- 
From: asuka

Sent: Tuesday, September 24, 2013 11:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Get only those documents that are fully satisfied.

Hi Andre,
  I don't want to get documents that fit my whole query, I want those
documents that are fully satisfied  with some terms of the query.

In other words, I'm interested in an exact match from the point of view of
the document, not from the point of view of the query.

Asuka



Andre Bois-Crettez wrote

(Your schema and query only appear on the nabble.com forum, it is mostly
empty for me on the mailing list)

What you want is probable to change OR to AND :

params.set("q.op", "AND");






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091775.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Excluding a facet's constraint to exclude a facet


: documentation that I can limit results to category "A" as follows:
: 
: fq={!raw f=foo}A
: 
: But I cannot seem to (Solr 3.6.1) exclude that way:
: 
: fq={!raw f=foo}-A

with the raw" qparser, there is no markup syntax at all -- so it's 
interpreting the "-" as part of the literal term value you are trying to 
query for.

: And the simpler test (with edismax) doesn't work either:
: 
: fq=foo:A# works
: fq=foo:-A   # doesn't work

likewise: in the lucene/dismax/edismax parsers operatores (like "-" and 
"+") need to come before the field you are querying on...

   fq=-foo:A
or fq={!edismax}-foo:A


If you upgrade to a more current 4.x version of Solr, then the 
default (lucene) parser in solr has been updated to recognize 
nested parser syntax (ie: "{!parser}input") as inline clauses, so you can 
use something like this...

   fq=-{!raw f=foo}A

...which results in the (default) lucene parser recognizing the "-" 
operator should be applied to a nested clause which is generated by asking 
the "raw" parser to use the local param "f=foo" when parsing the input "A"

One thing to watch out for however is whitespace -- if you have a query 
like this...

   fq=-{!raw f=foo}AAA BBB

...then the (default) lucene parser gets 2 clauses: a negated clause 
resulting form asking the "raw" parser to parse "AAA" and a positive 
clause that hte lucene parser parses itself, using the default search 
field to look for "BBB".

You would nee to use the "v" local param to ensure that the entire string 
gets parsed by the raw parser, either...

   fq=-{!raw f=foo v='AAA BBB'}
or fq=-{!raw f=foo v=$my_foo_fq}&my_foo_fq=AAA BBB




-Hoss

Excluding a facet's constraint to exclude a facet

2013-09-24 Thread Dan Davis

Summary - when constraining a search using filter query, how can I exclude
the constraint for a particular facet?

Detail - Suppose I have the following facet results for a query "q=*
mainquery*":





491
111
103
...

...

I understand from
http://people.apache.org/~hossman/apachecon2010/facets/and Wiki
documentation that I can limit results to category "A" as follows:

fq={!raw f=foo}A

But I cannot seem to (Solr 3.6.1) exclude that way:

fq={!raw f=foo}-A

And the simpler test (with edismax) doesn't work either:

fq=foo:A# works
fq=foo:-A   # doesn't work

Do I need to be using facet.method=enum to get this to work?   What else
could be the problem here?

Re: How to sort over all documents by score after Result Grouping / Field Collapsing

2013-09-24 Thread go2jun

Thanks Erick for your response.

My goal is
1. try to search from solr. In the search result, we would like show no more
than two results from the same source id.
2. For the search results, we would like all these results sorted by their
score.

So If I use solr result grouping to get the top two result from each group,
then I need to un-group them.

So my question is there any pure solr solution to handle this? I prefer it
handle by solr other than my application, because the search result are very
large.

Thanks!
Jun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-over-all-documents-by-score-after-Result-Grouping-Field-Collapsing-tp4091593p4091784.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can i search maximum number of word in particular docs

2013-09-24 Thread Upayavira

Are you saying that the more times the word appears, the more you want
it to score?

Note, add debugQuery=true to your query, and look near the end of the
output, you will be able to see exactly how the score was calculated and
thus which component wasn't behaving as you expected (you might want to
review this info alongside a Solr book or two, it is quite complex).

Looking at your example below, I suspect that all of your examples have
the same score, so are sorted randomly.

Upayavira

On Tue, Sep 24, 2013, at 01:38 PM, Viresh Modi wrote:
> Mu Query Looks Like:
> 
> start=0&rows=10&hl=true&hl.fl=content&qt=dismax
> &q=pookan
> &fl=id,application,timestamp,name,score,metaData,metaDataDate
> &fq=application:OnlineR3_6_4
> &fq=(metaData:channelId/101 OR metaData:channelId/104)
> &sort=score desc
> 
> 
> but not getting result as per desired
> 
>  OnlineR3_6_4_101_7
>  pookan pookan pookan
> 
> 
> OnlineR3_6_4_101_20
>  pookan pookan pookan pookan pookan
> 
> 
>  OnlineR3_6_4_101_19
>  pookan pookan pookan pookan
> 
> 
>   OnlineR3_6_4_101_21
>  pookan pookan
> 
> 
> 
> Acutually i want particular word for that match max in content tag that
> come first (relevancy based)

Re: DIH field defaults or re-assigning field values

2013-09-24 Thread P Williams

I discovered how to use the
ScriptTransformer
which
worked to solve my problem.  I had to make use
of context.setSessionAttribute(...,...,'global') to store a flag for the
value in the file because the script is only called if there are rows to
transform and I needed to know when the default was appropriate to set in
the root entity.

Thanks for your suggestions Alex.

Cheers,
Tricia

On Wed, Sep 18, 2013 at 1:19 PM, P Williams
wrote:

> Hi All,
>
> I'm using the DataImportHandler to import documents to my index.  I assign
> one of my document's fields by using a sub-entity from the root to look for
> a value in a file.  I've got this part working.  If the value isn't in the
> file or the file doesn't exist I'd like the field to be assigned a default
> value.  Is there a way to do this?
>
> I think I'm looking for a way to re-assign the value of a field.  If this
> is possible then I can assign the default value in the root entity and
> overwrite it if the value is found in the sub-entity. Ideas?
>
> Thanks,
> Tricia
>

Re: Interesting edismax/qs bug in Solr 3.5

2013-09-24 Thread Arcadius Ahouansou

Thanks Michael.

Arcadius.


On 23 September 2013 05:32, Michael Ryan  wrote:

> Sounds like https://issues.apache.org/jira/browse/LUCENE-3821 (issue
> seems to be fixed but still shows as open).
>
> -Michael
>
> -Original Message-
> From: Arcadius Ahouansou [mailto:arcad...@menelic.com]
> Sent: Sunday, September 22, 2013 11:15 PM
> To: solr-user
> Subject: Interesting edismax/qs bug in Solr 3.5
>
> We have been seeing a strange bug in our prod Solr 3.5.
>
> I went to download a fresh copy of Solr3.5, with default schema  and
> indexed (curl or post.jar) the following 2 docs
>
> [
>{
>   "id":"1",
>   "title":"One Earth"
>},
>{
>   "id":"2",
>   "title":"One Love One Earth"
>}
>
> ]
>
>
> I could browse and see the docs in solr.
>
> However, when I do:
> /solr/select?q="One Love One Earth"&qf=title&qs=2&defType=edismax&pf=title
>
> I get nothing back.
> when I change qs=4 in the query, then I see the expected doc2.
> debugQuery=true does not reveal anything.
>
> - I have noticed that when I reverse the order of the documents in the
> input file i.e doc2 first, then doc1 , and do the  indexing (using curl or
> post.jar), the the query above works and return doc2 as expected.
> - Same when I index only doc2 (doc1 not indexed).
>
> I tested solr3.6.2  and 4.4.0 and I can confirm they are not affected by
> this issue.
>
> I looked at the change logs for 3.6.2 and jira but could not find any
> trace of this problem.
>
> Any pointer to the ticket that addressed this issue will be appreciated.
>
>
> Thank you very much.
>
>
> Arcadius.
> .
>

Re: Get only those documents that are fully satisfied.

2013-09-24 Thread asuka

Hi Andre,
   I don't want to get documents that fit my whole query, I want those
documents that are fully satisfied  with some terms of the query.

In other words, I'm interested in an exact match from the point of view of
the document, not from the point of view of the query.

Asuka



Andre Bois-Crettez wrote
> (Your schema and query only appear on the nabble.com forum, it is mostly
> empty for me on the mailing list)
> 
> What you want is probable to change OR to AND :
> 
> params.set("q.op", "AND");





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091775.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Walter Underwood

Did all of the curl update commands return success? Ane errors in the logs?

wunder

On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:

> Is it possible that some of those 80K docs were simply not valid? e.g.
> had a wrong field, had a missing required field, anything like that?
> What happens if you clear this collection and just re-run the same
> indexing process and do everything else the same?  Still some docs
> missing?  Same number?
> 
> And what if you take 1 document that you know is valid and index it
> 80K times, with a different ID, of course?  Do you see 80K docs in the
> end?
> 
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
> 
> 
> 
> On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena  wrote:
>> Doc count did not change after I restarted the nodes. I am doing a single
>> commit after all 80k docs. Using Solr 4.4.
>> 
>> Regards,
>> Saurabh
>> 
>> 
>> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>> 
>>> Interesting. Did the doc count change after you started the nodes again?
>>> Can you tell us about commits?
>>> Which version? 4.5 will be out soon.
>>> 
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena"  wrote:
>>> 
 Hello,

 I am testing High Availability feature of SolrCloud. I am using the
 following setup

 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation

 I tried to index 80K documents on replicas (10K/replica in parallel).
 During indexing process, I stopped 4 Leader nodes. Once indexing is done,
 out of 80K docs only 79808 docs are indexed.

 Is this an expected behaviour ? In my opinion replica should take care of
 indexing if leader is down.

 If this is an expected behaviour, any steps that can be taken from the
 client side to avoid such a situation.

 Regards,
 Saurabh Saxena

>>> 

--
Walter Underwood
wun...@wunderwood.org

Re: Select all descendants in a relation index

2013-09-24 Thread Oussama Mubarak

Thank you Erick.

I actually do need it to extend to grandchildren as stated in "I need to
be able to find *all descendants* of a node with one query".
I already have an index that allows me to find the direct children of a
node, what I need is to be able to get all descendants of a node
(children, grandchildren... etc).

I have submitted this questions on stackoverflow where I put in more
details :
http://stackoverflow.com/questions/18984183/join-query-in-apache-solr-how-to-get-all-levels-in-hierarchical-data

Semiaddict

Le 24/09/2013 16:08, Erick Erickson a écrit :

Sure, index the parent node id (perhaps multiple) with each child
and add &fq=parent_id:12.

you can do the reverse and index each node with it's child node IDs
to to ask the inverse question.

This won't extend to grandchildren/parents, but you haven't stated that you
need to do this.

Best,
Erick

On Mon, Sep 23, 2013 at 6:23 PM, Semiaddict wrote:

Hello,

I am using Solr to index Drupal node relations (over 300k relations on over
500k nodes), where each relation consists of the following fields:
- id : the id of the relation
- source_id : the source (parent) node id
- targe_id : the targe (child) node id

I need to be able to find all descendants of a node with one query.
So far I've managed to get direct children using the join syntax of Solr4 such
as (http://wiki.apache.org/solr/Join):
/solr/collection/select?q={!join from=source_id to=target_id}source_id:12

Note that each node can have multiple parents and multiple children.

Is there a way to get all descendants of node 12 without having to create a
loop in PHP to find all children, then all children of each child, etc ?
If not, is it possible to create a recursive query directly in Solr, or is
there a better way to index tree structures ?

Any help or suggestion would be highly appreciated.

Thank you in advance,

Semiaddict

Re: [DIH] Logging skipped documents

2013-09-24 Thread Stefan Matheis

Jérôme

Just had a quick look at the source of 
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/XPathEntityProcessor.java?view=markup#l324
 .. which looks like there is LOG.warn(msg, e); Statement on Line 331 where msg 
should include the url for the tried document?

Otherwise, if that's not the place where the exception happens .. you might be 
able to add LOG Statements all by yourself and compile SOLR from Source (again) 
to make that work?

-Stefan  

On Monday, September 23, 2013 at 2:32 PM, jerome.dup...@bnf.fr wrote:

>  
> Hello,
>  
> I have a question, I index documents and a small part them are skipped, (I
> am in onError="skip" mode)
> I'm trying to get a list of them, in order to analyse what's worng with
> these documents
> Is there a mean to get the list of skipped documents, and some more
> information (my onError="skip" is in an XPathEntityProcessor, the name of
> the file processed would be OK)
>  
>  
> Cordialement,
> ---
> Jérôme Dupont
> Bibliothèque Nationale de France
> Département des Systèmes d'Information
> Tour T3 - Quai François Mauriac
> 75706 Paris Cedex 13
> téléphone: 33 (0)1 53 79 45 40
> e-mail: jerome.dup...@bnf.fr (mailto:jerome.dup...@bnf.fr)
> ---
>  
>  
>  
> Participez à la Grande Collecte 1914-1918 Avant d'imprimer, pensez à 
> l'environnement.

Re: Solr query processing

bq: As an aside, it would be nice if the queryparser could do the same
thing in Lucene

Lucene does not and (probably) will not ever know anything about the
schema. It's
purposely unaware of this higher-level construct. I wish you great good luck
persuading the lucene guys to have anything like a schema, you'll need it ;).

Best,
Erick



On Mon, Sep 23, 2013 at 9:44 PM, Otis Gospodnetic
 wrote:
> That's right.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Sep 23, 2013 12:55 PM, "Scott Smith"  wrote:
>
>> I just want to state a couple of things and hear someone say, "that's
>> right".
>>
>>
>> 1.   In a solr query you can have multiple fq's, but only a single q.
>>  And yes, I can simply AND the multiple "q"s together.  Just want to avoid
>> that if I'm wrong.
>>
>> 2.   A subtler issue is that when a full query is executied, Solr must
>> look at the schema to see how each field was tokenized (or not) and the
>> various other filters applied to a field so that it can properly transform
>> fields data (e.g., tokenize the text, but not keywords).  As an aside, it
>> would be nice if the queryparser could do the same thing in Lucene (I know,
>> wrong forum :)).
>> Scott
>>

Re: Select all descendants in a relation index

Sure, index the parent node id (perhaps multiple) with each child
and add &fq=parent_id:12.

you can do the reverse and index each node with it's child node IDs
to to ask the inverse question.

This won't extend to grandchildren/parents, but you haven't stated that you
need to do this.

Best,
Erick

On Mon, Sep 23, 2013 at 6:23 PM, Semiaddict  wrote:
> Hello,
>
> I am using Solr to index Drupal node relations (over 300k relations on over 
> 500k nodes), where each relation consists of the following fields:
> - id : the id of the relation
> - source_id : the source (parent) node id
> - targe_id : the targe (child) node id
>
> I need to be able to find all descendants of a node with one query.
> So far I've managed to get direct children using the join syntax of Solr4 
> such as (http://wiki.apache.org/solr/Join):
> /solr/collection/select?q={!join from=source_id to=target_id}source_id:12
>
> Note that each node can have multiple parents and multiple children.
>
> Is there a way to get all descendants of node 12 without having to create a 
> loop in PHP to find all children, then all children of each child, etc ?
> If not, is it possible to create a recursive query directly in Solr, or is 
> there a better way to index tree structures ?
>
> Any help or suggestion would be highly appreciated.
>
> Thank you in advance,
>
> Semiaddict

Re: Indexing bulk loads of PDF files and extracting information from them

Consider using a SolrJ program, perhaps multiple
ones running in parallel.

See: http://searchhub.org/dev/2012/02/14/indexing-with-solrj/

Best,
Erick

On Mon, Sep 23, 2013 at 3:31 PM, Sadika Amreen  wrote:
> Hi all,
>
>
>
> I am looking to index the entire directory of PDF files. We have a very large 
> volume of PDFs (3000+, possibly much more), so adding them manually would be 
> cumbersome.
>
>
>
> I have seen more than a couple of dozen links explaining how to index PDF 
> using SOLR, but none were details enough to help me get started.
>
> I understand that indexing a word or PDF document requires the use of the 
> ExtractingRequestHandler which uses Apache Tika.
>
>
>
> My question is:  How do I configure the Handler so that it can extract the 
> required information from bulk loads of PDF?
>
> I know I am asking a broad question, but I am struggling to find a good 
> guidance and something that would give me a step to step approach.
>
>
>
> There is an example configuration in the following link: 
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> I have also seen these threads:
>
> http://stackoverflow.com/questions/5947157/index-search-pdf-content-with-solr
>
> http://www.gossamer-threads.com/lists/lucene/general/158117
>
>
>
> I am still trying to understand the configuration process, so any concrete 
> help would be welcome.
>
>
>
> Thanks,
>
> Sadika Amreen
>
> Data Scientist
>
> PYA Analytics
>
>
>
> DISCLOSURE
>
>
>
> Any U.S. tax advice contained in the body of this email was not intended or 
> written to be used, and cannot be used, by the recipient for the purpose of 
> avoiding penalties that may be imposed under the Internal Revenue Code or 
> applicable state or local tax provisions.
>
>
>
> IMPORTANT NOTICE
>
>
>
> This E-mail (including any attachments) contains PRIVILEGED AND CONFIDENTIAL 
> INFORMATION protected by Federal and/or State law and is intended only for 
> the use of the individual(s) or entity(ies) designated as recipient(s). If 
> you are not an intended recipient of the E-mail, you are hereby notified that 
> any disclosure, copying, distribution, or action taken in reliance on the 
> contents of this E-mail is strictly prohibited. Disclosure to anyone other 
> than the intended recipient does not constitute a waiver of any applicable 
> privilege.
>
> If you have received this E-mail in error, please immediately notify us by 
> phone at (800) 270-9629 or reply to the sender of this email and then 
> permanently delete the original and any copy of this E-mail (including any 
> attachments) and destroy any printout thereof.

Re: How to sort over all documents by score after Result Grouping / Field Collapsing

It's not clear what you're trying to do. Do you want to un-group the results?
By that I mean are you trying to take the grouped results you get back and
display them in one flat list ordered by score?

If that's the case, the simplest thing to do would be to do this on
the application
side with the results, it should be quite straight-forward.

And you have not stated the user-level problem you're trying to solve, this
may be an XY problem.

Best,
Erick

On Mon, Sep 23, 2013 at 1:49 PM, go2jun  wrote:
> Hi, I have solr documents like this:
>
>   indexed="true"/>
>   indexed="true"/>
>   indexed="true"/>
>
> I know I can use solr Result Grouping / Field Collapsing to get the top 2
> result by grouping by source_id. Within each groups, documents sorted by
> scroe by query like this:
> http://localhost:8983/solr/select?q=bank&group.field=source_id&group=true&group.limit=2&group.main=true&sort=score
>
> My question is:
>
> 1. Is it possible to sort overall documents after I do above grouping?
> 2. Is there any other ways to implement above functions(by using solr
> functions directly)?
> 3. Is it possible to implement this by writing java code something like
> customized request handler to do this?
>
> Thanks in advance,
>
> Jun
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-sort-over-all-documents-by-score-after-Result-Grouping-Field-Collapsing-tp4091593.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: explicite deltaimports by given ids

2013-09-24 Thread Stefan Matheis

Peter

You can access request params that way: ${dataimporter.request.command} (from 
https://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters) - 
although i'm not sure what happens if you provide the same param multiple times.

Perhaps i'd go with &oid=5,6 as url param and use ".. WHERE oid IN( 
${dataimporter.request.oid} ) .." in the query?

-Stefan  

On Friday, September 13, 2013 at 3:37 PM, Peter SchÃ¼tt wrote:

> Hallo,
> I want to trigger a deltaimportquery by given IDs.
>  
> Example:
>  
> query="select oid, att1, att2 from my_table"
>  
> deltaImportQuery="select oid, att1, att2 from my_table  
> WHERE oid=${dih.delta.OID}"
>  
> deltaQuery="select OID from my_table WHERE
> TIME_STAMP > TO_DATE
> (${dih.last_index_time:VARCHAR}, '-MM-DD HH24:MI:SS')"
>  
> deletedPkQuery="select OID from my_table
> where TIME_STAMP > TO_DATE(${dih.last_index_time:VARCHAR}, '-MM-
> DD HH24:MI:SS')"
>  
>  
> Pseudo URL:  
>  
> http://solr-server/solr/mycore/dataimport/?command=deltaImportQuery&&oid=5
> &&oid=6
>  
> to trigger the update or insert of the datasets with OID in (5, 6).
>  
> What is the correct way?
>  
> Thanks for any hint.
>  
> Ciao
> Peter SchÃ¼tt
>  
>

Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Otis Gospodnetic

Is it possible that some of those 80K docs were simply not valid? e.g.
had a wrong field, had a missing required field, anything like that?
What happens if you clear this collection and just re-run the same
indexing process and do everything else the same?  Still some docs
missing?  Same number?

And what if you take 1 document that you know is valid and index it
80K times, with a different ID, of course?  Do you see 80K docs in the
end?

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena  wrote:
> Doc count did not change after I restarted the nodes. I am doing a single
> commit after all 80k docs. Using Solr 4.4.
>
> Regards,
> Saurabh
>
>
> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Interesting. Did the doc count change after you started the nodes again?
>> Can you tell us about commits?
>> Which version? 4.5 will be out soon.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena"  wrote:
>>
>> > Hello,
>> >
>> > I am testing High Availability feature of SolrCloud. I am using the
>> > following setup
>> >
>> > - 8 linux hosts
>> > - 8 Shards
>> > - 1 leader, 1 replica / host
>> > - Using Curl for update operation
>> >
>> > I tried to index 80K documents on replicas (10K/replica in parallel).
>> > During indexing process, I stopped 4 Leader nodes. Once indexing is done,
>> > out of 80K docs only 79808 docs are indexed.
>> >
>> > Is this an expected behaviour ? In my opinion replica should take care of
>> > indexing if leader is down.
>> >
>> > If this is an expected behaviour, any steps that can be taken from the
>> > client side to avoid such a situation.
>> >
>> > Regards,
>> > Saurabh Saxena
>> >
>>

Re: Soft commit and flush

2013-09-24 Thread Otis Gospodnetic

Hi,

I believe data is not fsynched to disk until a hard commit (and even
then disks can lie to you and tell you data is safe even though it's
still in disk cache waiting to really be written to the medium) ,
which is why you can lose it between hard commits.  Soft commits just
make newly added docs visible in search results.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Sep 24, 2013 at 7:51 AM, adfel70  wrote:
> I am struggling to get a deep understanding of soft commit.
> I have read  Erick's post
> 
> which helped me a lot with when and why we should call each type of commit.
> But still, I cant understand what exactly happens when we call soft commit:
> I mean, does the new data is flushed, fsynched, or hold in the RAM... ?
> I tried to test it myself and I got 2 different behaviours:
> a. If I just had 1 document that was added to the index, soft commit did not
> cause index files to change.
> b. If I had a big change (addition of about 100,000 docs, ~5MB tlog file),
> calling the soft commit DID change the index files - so I guess that soft
> commit caused fsynch.
>
> My conclusion is that soft commit always flushes the data, but because of
> the implementation of NRTCachingDirectoryFactory, the data will be written
> to the disk when its getting too big.
>
> Can some one please correct me?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Javascript StatelessScriptUpdateProcessor

2013-09-24 Thread Stefan Matheis

Luís would you mind sharing your findings for others / archive?

On Tuesday, September 10, 2013 at 6:49 PM, Luís Portela Afonso wrote:

> Solved
> On Sep 10, 2013, at 4:55 PM, Luís Portela Afonso  (mailto:meligalet...@gmail.com)> wrote:
>  
> > It's that possible to execute queries on a javascript script on 
> > StatelessScriptUpdateProcessor.
> > I'm processing data with a javascript i want to execute a query to the 
> > indexed data of solr.
> >  
> > I know that the javascript script, has an instance of SolrQueryRequest and 
> > SolrQueryResponse, but neither can be used. At least i'm not being able to 
> > use it.  
>  
>  
> Attachments:  
> - smime.p7s
>

Search statistics in category scale

2013-09-24 Thread Marina

I need to implement further functionality picture of it is attached below.
  I have
already running application based o Solr search.
In a few words about it: drop down will contain similar search phrases
within concrete category and number of items found.
Does Solr provide to collect such data and somehow receive it?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-statistics-in-category-scale-tp4091734.html
Sent from the Solr - User mailing list archive at Nabble.com.

How can i search maximum number of word in particular docs

2013-09-24 Thread Viresh Modi

Mu Query Looks Like:

start=0&rows=10&hl=true&hl.fl=content&qt=dismax
&q=pookan
&fl=id,application,timestamp,name,score,metaData,metaDataDate
&fq=application:OnlineR3_6_4
&fq=(metaData:channelId/101 OR metaData:channelId/104)
&sort=score desc


but not getting result as per desired

 OnlineR3_6_4_101_7
 pookan pookan pookan


OnlineR3_6_4_101_20
 pookan pookan pookan pookan pookan


 OnlineR3_6_4_101_19
 pookan pookan pookan pookan


  OnlineR3_6_4_101_21
 pookan pookan



Acutually i want particular word for that match max in content tag that
come first (relevancy based)

RE: Solr DIH call a java class

2013-09-24 Thread Dyer, James

You probably want to write a custom Transformer.  See: 
http://wiki.apache.org/solr/DIHCustomTransformer

Or maybe a custom Evaluator.  See:  
http://wiki.apache.org/solr/DataImportHandler#Evaluators_-_Custom_formatting_in_queries_and_urls

Possibly one or more of the built-in Transformers will do the job.  See:  
http://wiki.apache.org/solr/DataImportHandler#Transformer

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Prasi S [mailto:prasi1...@gmail.com] 
Sent: Tuesday, September 24, 2013 5:26 AM
To: solr-user@lucene.apache.org
Subject: Solr DIH call a java class

Hi,
Can we call a java class inside a solr ddata-config.xml file similar to
calling a script function.

I have few manipulations to do before sending data via dataimporthandler.
For each row, can I pass that row to a java class in the same way we pass
it to a script function?


Thanks,
Prasi

RE: Using CachedSqlEntityProcessor with delta imports in DIH

2013-09-24 Thread Dyer, James

I think delta imports only work on the parent entity and cached child entities 
will load in full, even if you only need to look up a few rows for the delta.  
Others though might have a way to get this to work.

Here's two possible workarounds.

On the child entity, specify:  

When it is a full import, pass the parameter: cache.impl=SortedMapBackedCache . 
For delta imports, leave this blank.  This (I think) will give you a cache for 
the full-import and no cache for the deltas.

Another workaround is to include a subquery on your delta import like this:
Select * from table ${delta.subquery}
When it is a delta import, pass the pass the paremeter: delta.subquery=where 
blah in (select blah from parent_table ...)

This will cause it to cache only the entries needed for that delta import.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: david.r.laroche...@gmail.com [mailto:david.r.laroche...@gmail.com] On 
Behalf Of David Larochelle
Sent: Monday, September 23, 2013 5:22 PM
To: solr-user
Subject: Using CachedSqlEntityProcessor with delta imports in DIH

I'm trying to use the CachedSqlEntityProcessor on a child entity that also
has a delta query.

Full imports and delta imports of the parent entity work fine however delta
imports for the child entity have no effect. If I remove the
processor="CachedSqlEntityProcessor" attribute from the child entity, the
delta import works flawlessly but the full import is very slow.
Here's my data-config.xml:



  http://www.w3.org/2001/XInclude"/>
  

  
  

  



I need to be able to run delta imports based on the media_tags_map table in
addition to the story_sentences table.

Any idea why delta imports for media_tags_map won't work when the
CachedSqlEntityProcessor is used?

I've searched extensively but can't find an example that uses both
CachedSqlEntityProcessor and deltaQuery on the sub-entity or any
explanation of why the above configuration won't work as expected.

--

Thanks,

David

RE: solr4.4 admin page show "loading"

2013-09-24 Thread Ramesh

Use Mozilla for better use even in IE it is not working properly

-Original Message-
From: William Bell [mailto:billnb...@gmail.com] 
Sent: Tuesday, September 24, 2013 12:02 PM
To: solr-user@lucene.apache.org
Subject: Re: solr4.4 admin page show "loading"

Use Chrome.


On Thu, Sep 19, 2013 at 7:32 AM, Micheal Chao wrote:

> hi, I have installed solr4.4 on tomcat7.0. the problem is I can't see 
> the solr admin page, it's always show "loading". I can't find any 
> error in tomcat logs, and I can send search request, and get the result.
>
> what can I do? please help me, thank you very much.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr4-4-admin-page-show-loading-tp4
> 091039.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>



--
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Soft commit and flush

2013-09-24 Thread adfel70

I am struggling to get a deep understanding of soft commit.
I have read  Erick's post

  
which helped me a lot with when and why we should call each type of commit.
But still, I cant understand what exactly happens when we call soft commit:
I mean, does the new data is flushed, fsynched, or hold in the RAM... ?
I tried to test it myself and I got 2 different behaviours: 
a. If I just had 1 document that was added to the index, soft commit did not
cause index files to change.
b. If I had a big change (addition of about 100,000 docs, ~5MB tlog file),
calling the soft commit DID change the index files - so I guess that soft
commit caused fsynch.

My conclusion is that soft commit always flushes the data, but because of
the implementation of NRTCachingDirectoryFactory, the data will be written
to the disk when its getting too big. 

Can some one please correct me? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Complex query combining fq and q with join

2013-09-24 Thread marotosg

I found the solution.


http://dzoessolr020:8080/solr4/person/select/?
&q= 
(
( ( GenderSFD:Male )
AND {!join from=PersonID to=CoreID fromIndex=personjob
v='((CoCompanyName:"hospital") OR (PoPositionsAllS:"developer"))'} 

AND {!join from=DocPersonAttachS to=CoreID fromIndex=document v='(DocNameS:"
PeterRES")'} 
)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-query-combining-fq-and-q-with-join-tp4091563p4091725.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: searching within documents

Why does it happens that for few words it shows output and for few it does
not?

For example,
1)
q=contents:Sushant

numfound is 0

q=contents:sushant

gives output

2)
q=contents:acted

numfound 0

q=contents:well

gives output

This is the document:

  
13

  chetan

worst book
solr,lucene
Sushant acted well in kaipoche.
3 mistakes
0012345654334



Please do reply.Help will be appreciated.
Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091713.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr DIH call a java class

2013-09-24 Thread Prasi S

Hi,
Can we call a java class inside a solr ddata-config.xml file similar to
calling a script function.

I have few manipulations to do before sending data via dataimporthandler.
For each row, can I pass that row to a java class in the same way we pass
it to a script function?


Thanks,
Prasi

RE: searching within documents

Okay thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091705.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: searching within documents

2013-09-24 Thread Gupta, Abhinav

It's not always that when you change schema.xml you need to re-index. 
For eg., if you add any tokenizer for Query Analyser you don't need to reindex. 

But in below case I suppose your changes in schema is related for indexing 
time. Then you need to re-index.

Sequencing of documents depends entirely on relevance (score) of document. 
Hope it helps.

Thanks,
Abhinav

-Original Message-
From: Nutan [mailto:nutanshinde1...@gmail.com] 
Sent: 24 September 2013 14:34
To: solr-user@lucene.apache.org
Subject: Re: searching within documents

First I indexed documents using "indexing xml files to solr(sending doc to solr 
using xml file)"
Then I made changes to schema.xml ie. I added analyzer and tokenizer.
I then indexed some new documents using same procedure,now my searching with 
spaces works only for newly indexed files and not the initial files.
Is it true that, after making changes to schema.xml re-indexing is necessary??

Is it the case that searching few words works and for others it may not, like 
when i query:
q=contents:used

output:numfound=0

and for
q=contents:for
 
output:
 "response":{"numFound":2,"start":0,"docs":[
  {
"id":"7",
"author":["nutan"],
"comments":"best book",
"keywords":"solr,lucene",
"contents":"solr,lucene is used for search based service.",
"title":"solr cookbook 3.1",
"revision_number":"0012345654334"},
  {
"id":"8",
"author":["nutan shinde"],
"comments":"best book for solr",
"keywords":"solr,lucene,apache tika",
"contents":"solr,lucene is used for search based service.Google works 
uses web crawler.Lucene can implelment web crawler",
"title":"solr enterprise search server",
"revision_number":"00123467889767"}]
  }}

my shema.xml is:
















   
   







id


and also for each query :
contents:for
contents:search

the sequence in which documents occur changes.What is the reason for it?
How are the documents retrieved?Does it depend on the number of indexes




--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091697.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: searching within documents

2013-09-24 Thread Gora Mohanty

On 24 September 2013 14:34, Nutan  wrote:
> First I indexed documents using "indexing xml files to solr(sending doc to
> solr using xml file)"
> Then I made changes to schema.xml ie. I added analyzer and tokenizer.
> I then indexed some new documents using same procedure,now my searching with
> spaces works only for newly indexed files and not the initial files.
> Is it true that, after making changes to schema.xml re-indexing is
> necessary??
[...]

Yes, it is required.

Regards,
Gora

Re: searching within documents

2013-09-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

First I indexed documents using "indexing xml files to solr(sending doc to
solr using xml file)"
Then I made changes to schema.xml ie. I added analyzer and tokenizer.
I then indexed some new documents using same procedure,now my searching with
spaces works only for newly indexed files and not the initial files.
Is it true that, after making changes to schema.xml re-indexing is
necessary??

Is it the case that searching few words works and for others it may not,
like when i query:
q=contents:used

output:numfound=0

and for
q=contents:for
 
output:
 "response":{"numFound":2,"start":0,"docs":[
  {
"id":"7",
"author":["nutan"],
"comments":"best book",
"keywords":"solr,lucene",
"contents":"solr,lucene is used for search based service.",
"title":"solr cookbook 3.1",
"revision_number":"0012345654334"},
  {
"id":"8",
"author":["nutan shinde"],
"comments":"best book for solr",
"keywords":"solr,lucene,apache tika",
"contents":"solr,lucene is used for search based service.Google
works uses web crawler.Lucene can implelment web crawler",
"title":"solr enterprise search server",
"revision_number":"00123467889767"}]
  }}

my shema.xml is:


 












 




  





 








id


and also for each query :
contents:for
contents:search

the sequence in which documents occur changes.What is the reason for it?
How are the documents retrieved?Does it depend on the number of indexes




--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091697.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud setup - any advice?

2013-09-24 Thread Neil Prosser

Shawn: unfortunately the current problems are with facet.method=enum!

Erick: We already round our date queries so they're the same for at least
an hour so thankfully our fq entries will be reusable. However, I'll take a
look at reducing the cache and autowarming counts and see what the effect
on hit ratios and performance are.

For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds
and our hard commit is 15 minutes.

You're right about those sorted fields having a lot of unique values. They
can be any number between 0 and 10,000,000 (it's sparsely populated across
the documents) and could appear in several variants across multiple
documents. This is probably a good area for seeing what we can bend with
regard to our requirements for sorting/boosting. I've just looked at two
shards and they've each got upwards of 1000 terms showing in the schema
browser for one (potentially out of 60) fields.



On 21 September 2013 20:07, Erick Erickson  wrote:

> About caches. The queryResultCache is only useful when you expect there
> to be a number of _identical_ queries. Think of this cache as a map where
> the key is the query and the value is just a list of N document IDs
> (internal)
> where N is your window size. Paging is often the place where this is used.
> Take a look at your admin page for this cache, you can see the hit rates.
> But, the take-away is that this is a very small cache memory-wise, varying
> it is probably not a great predictor of memory usage.
>
> The filterCache is more intense memory wise, it's another map where the
> key is the fq clause and the value is bounded by maxDoc/8. Take a
> close look at this in the admin screen and see what the hit ratio is. It
> may
> be that you can make it much smaller and still get a lot of benefit.
> _Especially_ considering it could occupy about 44G of memory.
> (43,000,000 / 8) * 8192 And the autowarm count is excessive in
> most cases from what I've seen. Cutting the autowarm down to, say, 16
> may not make a noticeable difference in your response time. And if
> you're using NOW in your fq clauses, it's almost totally useless, see:
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> Also, read Uwe's excellent blog about MMapDirectory here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> for some problems with over-allocating memory to the JVM. Of course
> if you're hitting OOMs, well.
>
> bq: order them by one of their fields.
> This is one place I'd look first. How many unique values are in each field
> that you sort on? This is one of the major memory consumers. You can
> get a sense of this by looking at admin/schema-browser and selecting
> the fields you sort on. There's a text box with the number of terms
> returned,
> then a / ### where ### is the total count of unique terms in the field.
> NOTE:
> in 4.4 this will be -1 for multiValued fields, but you shouldn't be
> sorting on
> those anyway. How many fields are you sorting on anyway, and of what types?
>
> For your SolrCloud experiments, what are your soft and hard commit
> intervals?
> Because something is really screwy here. Your sharding moving the
> number of docs down this low per shard should be fast. Back to the point
> above, the only good explanation I can come up with from this remove is
> that the fields you sort on have a LOT of unique values. It's possible that
> the total number of unique values isn't scaling with sharding. That is,
> each
> shard may have, say, 90% of all unique terms (number from thin air). Worth
> checking anyway, but a stretch.
>
> This is definitely unusual...
>
> Best,
> Erick
>
>
> On Thu, Sep 19, 2013 at 8:20 AM, Neil Prosser 
> wrote:
> > Apologies for the giant email. Hopefully it makes sense.
> >
> > We've been trying out SolrCloud to solve some scalability issues with our
> > current setup and have run into problems. I'd like to describe our
> current
> > setup, our queries and the sort of load we see and am hoping someone
> might
> > be able to spot the massive flaw in the way I've been trying to set
> things
> > up.
> >
> > We currently run Solr 4.0.0 in the old style Master/Slave replication. We
> > have five slaves, each running Centos with 96GB of RAM, 24 cores and with
> > 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs)
> but
> > aren't slow either. Our GC parameters aren't particularly exciting, just
> > -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
> >
> > Our index size ranges between 144GB and 200GB (when we optimise it back
> > down, since we've had bad experiences with large cores). We've got just
> > over 37M documents some are smallish but most range between 1000-6000
> > bytes. We regularly update documents so large portions of the index will
> be
> > touched leading to a maxDocs value of around 43M.
> >
> > Query load ranges between 400req/s to 800req/s across the five slaves
> > throughout the day, increasing and decreasing gradually

Re: Hash range to shard assignment

That is in the pipeline. within next 3-4 months for sure

On Mon, Sep 23, 2013 at 11:07 PM, lochri  wrote:
> Yes, actually that would be a very comfortable solution.
> Is that planned ? And if so, when will it be released ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091591.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
Noble Paul

Re: requested url solr/update/extract not available on this server