date:20140212

Note: In SolrCloud terminology, a leader is also a replica. IOW, you have 
two replicas, one of which (and it can vary over time) is elected as leader 
for that shard.


The other shards remain capable of indexing even if one shard becomes 
unavailable. That is expected - and desired - behavior in a fault-tolerant, 
fully-distributed system. Your application can/should make its own decision 
as to what it will do if an indexing operation cannot be serviced.


-- Jack Krupansky

-Original Message- 
From: elmerfudd

Sent: Wednesday, February 12, 2014 7:54 AM
To: solr-user@lucene.apache.org
Subject: Indexing in the case of entire shard failure

We have a system which consists of 2 shards while every shard has a leader
and one replica.
During indexing one of the shards (both leader and replica) was shut down.
We got two types of HTTP requests: rm= Service Unavailable and rm=OK.

From this we’ve got to the conclusion that the shard which was up was going

on indexing. Is this behavior correct? We expected that in the case of the
entire shard fails the system stop indexing completely.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-in-the-case-of-entire-shard-failure-tp4116881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Navaa

Hi,
I am using solr for searching phoneticly equivalent string
my schema contains...
fieldType name=text_general_doubleMetaphone class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.PhoneticFilterFactory 
encoder=DoubleMetaphone
inject=true/
filter class=solr.LowerCaseFilterFactory/ 
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.PhoneticFilterFactory 
encoder=DoubleMetaphone
inject=true/
filter class=solr.LowerCaseFilterFactory/ 
/analyzer
 /fieldType

and fields are

field name=fname type=text_general indexed=true stored=true
required=false /
field name=fname_sound type=text_general_doubleMetaphone indexed=true
stored=true required=false /
   copyField source=fname dest=fname_sound /


it works when i search stfn=== stephen, stephn 

But I am expecting stephn= stephen like 
How I will get this result. m I doing something wrong

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-phonetic-by-DoubleMetaphone-soundex-encoder-tp4116885.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy

I am running a very simple performance experiment where I post 2000 documents 
to my application. Who in turn persists them to a relational DB and sends them 
to Solr for indexing (Synchronously, in the same request).
I am testing 3 use cases:

  1.  No indexing at all - ~45 sec to post 2000 documents
  2.  Indexing included - commit after each add. ~8 minutes (!) to post and 
index 2000 documents
  3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and index 
2000 documents
The 3rd result does not make any sense, I would expect the behavior to be 
similar to the one in point 2. At first I thought that the documents were not 
really committed but I could actually see them being added by executing some 
queries during the experiment (via the solr web UI).
I am worried that I am missing something very big. The code I use for point 2:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection
solrConnection.add(doc);
solrConnection.commit();
Whereas the code for point 3:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection
solrConnection.add(doc, 1); // According to API documentation I understand 
there is no need to explicitly call commit with this API
Is it possible that committing after each add will degrade performance by a 
factor of 40?

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Mark Miller

Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not a 
standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com wrote:

 I am running a very simple performance experiment where I post 2000 documents 
 to my application. Who in turn persists them to a relational DB and sends 
 them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:
 
  1.  No indexing at all - ~45 sec to post 2000 documents
  2.  Indexing included - commit after each add. ~8 minutes (!) to post and 
 index 2000 documents
  3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and index 
 2000 documents
 The 3rd result does not make any sense, I would expect the behavior to be 
 similar to the one in point 2. At first I thought that the documents were not 
 really committed but I could actually see them being added by executing some 
 queries during the experiment (via the solr web UI).
 I am worried that I am missing something very big. The code I use for point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc);
 solrConnection.commit();
 Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc, 1); // According to API documentation I understand 
 there is no need to explicitly call commit with this API
 Is it possible that committing after each add will degrade performance by a 
 factor of 40?

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Joel Bernstein

Yes, committing after each document will greatly degrade performance. I
typically use autoCommit and autoSoftCommit to set the time interval
between commits, but commitWithin should have a similar effect.. I often
see performance of 2000+ docs per second on the load using auto commits.
When explicitly committing after each document, your commits will happen
too frequently, overworking the indexing process.

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Feb 12, 2014 at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.comwrote:

 I am running a very simple performance experiment where I post 2000
 documents to my application. Who in turn persists them to a relational DB
 and sends them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:

   1.  No indexing at all - ~45 sec to post 2000 documents
   2.  Indexing included - commit after each add. ~8 minutes (!) to post
 and index 2000 documents
   3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and
 index 2000 documents
 The 3rd result does not make any sense, I would expect the behavior to be
 similar to the one in point 2. At first I thought that the documents were
 not really committed but I could actually see them being added by executing
 some queries during the experiment (via the solr web UI).
 I am worried that I am missing something very big. The code I use for
 point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc);
 solrConnection.commit();
 Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc, 1); // According to API documentation I understand
 there is no need to explicitly call commit with this API
 Is it possible that committing after each add will degrade performance by
 a factor of 40?

Newb - Search not returning any results

2014-02-12 Thread leevduhl

I setup a Solr Core and populated it with documents but I am not able to get
any results when attempting to search the documents.

A generic search (q=*.*) returns all documents (and fields/values within
those documents), however when I try to search using specific criteria I get
no results back.

I have the following setup in my Schema.xml:
field name=id type=string indexed=true stored=true required=true
multiValued=false / 
field name=mailingcity type=string indexed=true stored=true
multiValued=false/

I run the following query against my Solr instance:
http://{domain}8983/solr/MIM/select?q=*rows=2fl=id+mailingcitywt=jsonindent=truedebugQuery=true

I get the following results back:
{
  responseHeader:{
status:0,
QTime:0,
params:{
  debugQuery:true,
  fl:id, mailingcity,
  indent:true,
  q:*,
  wt:json,
  rows:2}},
  response:{numFound:18024,start:0,docs:[
  {
id:214530123,
mailingcity:redford},
  {
id:204686608,
mailingcity:detroit}]
  },
  debug:{
rawquerystring:*,
querystring:*,
parsedquery:MatchAllDocsQuery(*:*),
parsedquery_toString:*:*,
explain:{
  214530123:\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n  1.0 =
queryNorm\n,
  204686608:\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n  1.0 =
queryNorm\n},
QParser:LuceneQParser,
  ...
}

However, when I try to search specifically where mailingcity=redford I
don't get any results back.  See the following query/results.

Query:
http://{domain}:8983/solr/MIM/select?q=mailingcity=redfordrows=2fl=id,mailingcitywt=jsonindent=truedebugQuery=true

Results:
{
  responseHeader: {
status: 0,
QTime: 1,
params: {
  debugQuery: true,
  fl: id,mailingcity,
  indent: true,
  q: mailingcity=redford,
  _: 1392218691700,
  wt: json,
  rows: 2
}
  },
  response: {
numFound: 0,
start: 0,
docs: []
  },
  debug: {
rawquerystring: mailingcity=redford,
querystring: mailingcity=redford,
parsedquery: text:mailingcity text:redford,
parsedquery_toString: text:mailingcity text:redford,
explain: {},
QParser: LuceneQParser,
  ...
}

If anyone can provide some info on why this is happening and how to solve it
it would be appreciated.

Thank You
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newb-Search-not-returning-any-results-tp4116905.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy

I absolutely agree and I even read the NRT page before posting this question.

The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms delay,- 
I expect that this behavior will be equivalent to making a commit- from a 
functional perspective.

Seeing that there is no magic in the world, I am trying to understand what is 
the price I am actually paying when using the commitWithin feature, on the one 
hand it commits almost immediately, on the other hand, it performs wonderfully. 
Where is the catch?


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be true. I am 
afraid I am missing something

Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not a 
standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com wrote:

 I am running a very simple performance experiment where I post 2000 documents 
 to my application. Who in turn persists them to a relational DB and sends 
 them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:
 
  1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing 
 included - commit after each add. ~8 minutes (!) to post and index 
 2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds 
 (!) to post and index 2000 documents The 3rd result does not make any sense, 
 I would expect the behavior to be similar to the one in point 2. At first I 
 thought that the documents were not really committed but I could actually see 
 them being added by executing some queries during the experiment (via the 
 solr web UI).
 I am worried that I am missing something very big. The code I use for point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection solrConnection.add(doc); 
 solrConnection.commit(); Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection solrConnection.add(doc, 
 1); // According to API documentation I understand there is no need to 
 explicitly call commit with this API Is it possible that committing after 
 each add will degrade performance by a factor of 40?

Re: Importing database DIH

2014-02-12 Thread Gora Mohanty

On 12 February 2014 20:53, Maheedhar Kolla maheedhar.ko...@gmail.com wrote:

 Hi ,


 I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

  I see the following error when I try to do a full-import from my
 local MySQL table ( http:/s/solr//dataimport?command=full-import
 ).

 snip
 ..
 str name=Total Requests made to DataSource0/str
 str name=Total Rows Fetched0/str
 str name=Total Documents Processed0/str
 str name=Total Documents Skipped0/str
 str name=Indexing failed. Rolled back all changes./str
 
 /snip

 I did search to find ways to solve this problem and did create the
 file dataimport.properties , but no success.
[...]

You do not have to create dataimport.properties. Look in the Tomcat logs
for more details on the error, and post the relevant sections here if you
cannot make sense of it. My guess would be that your database credentials
are incorrect, or that the SELECT is failing. Try logging into mysql from
an admin. tool with those credentials, and running the SELECT manually.

Regards,
Gora

Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng

 I was just trying to use SolrJ Client to import XML data to Solr server. And
I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary
format 

I realized there is a XML parser in Solr (We can use a dataUpadateHandler in
Solr default UI Solr Core Dataimport)

So I was wondering how to directly use solr xml parser to upload xml by
using SolrJ Java Code? I could use other open-source xml parser, But I
really want to know if there is a way to call Solr parser library.

Would you mind send me a simple code if possible, really appreciated.
Thanks in advance.

solr/4.6.1
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901.html
Sent from the Solr - User mailing list archive at Nabble.com.

Importing database DIH

2014-02-12 Thread Maheedhar Kolla

Hi ,


I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

 I see the following error when I try to do a full-import from my
local MySQL table ( http:/s/solr//dataimport?command=full-import
).

snip
..
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Indexing failed. Rolled back all changes./str

/snip

I did search to find ways to solve this problem and did create the
file dataimport.properties , but no success.

Any help would be appreciated.


cheers,
Kolla

PS:  When I check the admin panel for statistics for the query
/dataimport , I see the following:

Status : IDLE
Documents Processed : 0
Requests made to DataSource : 0
Rows Fetched : 0
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 0
Total Requests made to DataSource : 0
Total Rows Fetched : 0
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1391612468278
requests : 5
errors : 0
timeouts : 0
totalTime : 28
avgTimePerRequest : 5.6


Also, Here is my dataconfig file.
dataConfig
 dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/DBNAME
user=USER password=PWD/
 document
 entity name=docs query=select * from TABLENAME
field column=id name=id/
field column=content name=text/
field column=title name=title/
 /entity
 /document





-- 
Cheers,
Kolla

RE: Importing database DIH

2014-02-12 Thread Pisarev, Vitaliy

It can be anything from wrong credentials, to missing driver in the class path, 
to malformed connection string, etc..

What does the Solr log say? 

-Original Message-
From: Maheedhar Kolla [mailto:maheedhar.ko...@gmail.com] 
Sent: יום ד 12 פברואר 2014 17:23
To: solr-user@lucene.apache.org
Subject: Importing database DIH

Hi ,

I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

 I see the following error when I try to do a full-import from my local MySQL 
table ( http:/s/solr//dataimport?command=full-import
).

snip
..
str name=Total Requests made to DataSource0/str str name=Total Rows 
Fetched0/str str name=Total Documents Processed0/str str name=Total 
Documents Skipped0/str str name=Indexing failed. Rolled back all 
changes./str 
/snip

I did search to find ways to solve this problem and did create the file 
dataimport.properties , but no success.

Any help would be appreciated.

cheers,
Kolla

PS:  When I check the admin panel for statistics for the query /dataimport , I 
see the following:

Status : IDLE
Documents Processed : 0
Requests made to DataSource : 0
Rows Fetched : 0
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 0
Total Requests made to DataSource : 0
Total Rows Fetched : 0
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1391612468278
requests : 5
errors : 0
timeouts : 0
totalTime : 28
avgTimePerRequest : 5.6

Also, Here is my dataconfig file.
dataConfig
 dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/DBNAME
user=USER password=PWD/  document
 entity name=docs query=select * from TABLENAME
field column=id name=id/
field column=content name=text/
field column=title name=title/
 /entity
 /document

--
Cheers,
Kolla

Re: Newb - Search not returning any results

2014-02-12 Thread Gora Mohanty

On 12 February 2014 20:57, leevduhl ld...@corp.realcomp.com wrote:
[...]
 However, when I try to search specifically where mailingcity=redford I
 don't get any results back.  See the following query/results.

 Query:
 http://{domain}:8983/solr/MIM/select?q=mailingcity=redfordrows=2fl=id,mailingcitywt=jsonindent=truedebugQuery=true

Please start by reading https://wiki.apache.org/solr/SolrQuerySyntax
The argument to 'q' above should be mailingcity:redford. The debug
section in the results even tells you that, as the parsedquery becomes
text:mailingcity text:redford which means that it is searching the
default full-text search field for the strings mailingcity and/or redford.

Regards,
Gora

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Dmitry Kan

Cross-posting my answer from SO:

According to this wiki:

https://wiki.apache.org/solr/NearRealtimeSearch

the commitWithin is a soft-commit by default. Soft-commits are very
efficient in terms of making the added documents immediately searchable.
But! They are not on the disk yet. That means the documents are being
committed into RAM. In this setup you would use updateLog to be solr
instance crash tolerant.

What you do in point 2 is hard-commit, i.e. flush the added documents to
disk. Doing this after each document add is very expensive. So instead,
post a bunch of documents and issue a hard commit or even have you
autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
your user expectations).



On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy vitaliy.pisa...@hp.comwrote:

 I absolutely agree and I even read the NRT page before posting this
 question.

 The thing that baffles me is this:

 Doing a commit after each add kills the performance.
 On the other hand, when I use commit within and specify an (absurd) 1ms
 delay,- I expect that this behavior will be equivalent to making a commit-
 from a functional perspective.

 Seeing that there is no magic in the world, I am trying to understand what
 is the price I am actually paying when using the commitWithin feature, on
 the one hand it commits almost immediately, on the other hand, it performs
 wonderfully. Where is the catch?


 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: יום ד 12 פברואר 2014 17:00
 To: solr-user
 Subject: Re: Solr perfromance with commitWithin seesm too good to be true.
 I am afraid I am missing something

 Doing a standard commit after every document is a Solr anti-pattern.

 commitWithin is a “near-realtime” commit in recent versions of Solr and
 not a standard commit.

 https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

 - Mark

 http://about.me/markrmiller

 On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
 wrote:

  I am running a very simple performance experiment where I post 2000
 documents to my application. Who in turn persists them to a relational DB
 and sends them to Solr for indexing (Synchronously, in the same request).
  I am testing 3 use cases:
 
   1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
  included - commit after each add. ~8 minutes (!) to post and index
  2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
  (!) to post and index 2000 documents The 3rd result does not make any
 sense, I would expect the behavior to be similar to the one in point 2. At
 first I thought that the documents were not really committed but I could
 actually see them being added by executing some queries during the
 experiment (via the solr web UI).
  I am worried that I am missing something very big. The code I use for
 point 2:
  SolrInputDocument = // get doc
  SolrServer solrConnection = // get connection solrConnection.add(doc);
  solrConnection.commit(); Whereas the code for point 3:
  SolrInputDocument = // get doc
  SolrServer solrConnection = // get connection solrConnection.add(doc,
  1); // According to API documentation I understand there is no need to
  explicitly call commit with this API Is it possible that committing
 after each add will degrade performance by a factor of 40?
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan

Re: Indexing strategies?

I'd seriously consider a SolrJ program that pulled the necessary data from
two of your systems, held it in cache and then pulled the data from your
main system and enriched it with the cached data.

Or export your information from your remote systems and import them into
a single system where you could do joins.

I believe DIH has some caching ability too that you might consider.

Your basic problem is an inefficient data model where you have to query
these different systems on a row-by-row system, that's where I'd concentrate
my energies..

Best,
Erick


On Wed, Feb 12, 2014 at 2:09 AM, manju16832003 manju16832...@gmail.comwrote:

 Hi,
 I'm facing a dilemma of choosing the indexing strategies.
 My application architecture is
  - I have a listing table in my DB
  - For each listing, I have 3 calls to a URL Datasource of different system

  I have 200k records

  Time taken to index 25 docs is 1Minute, so for 200k it might take more
 than
 100hrs :-(?


  I know there are lot of factors to consider from Network to DB.
 I'm looking for different strategies that we could perform index.

  - Can we run multiple data import handlers? one data-config for first 100k
 and second one is for another 100k
  - Would it be possible to write java service using SolrJ and perform
 multi-threaded calls to Solr to Index?
  - The URL Datasources i'm using is actually resided in MSSQL database of
 different system. Could I be able to fasten indexing time if I just could
 use JDBCDataSource that calls DB directly instead through API URL data
 source?

 Is there any other strategies we could use?

 Thank you,




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phonetic search on multiple fields

First, why are you talking about DoubleMetaphone when
your fieldType uses BeiderMorseFilterFactory? Which points
up a basic issue you need to wrap your head around or you'll
be endlessly confused. At least I was...

Your analysis chains _must_ do compatible things at index and
query time. The fieldType you're using for phonetic searching does
not since it doesn't use the BeiderMorseFilterFactory at query time.

So the actual values in your index are whatever the Beider... factory
produces but the terms searched are NOT transformed by that factory.

Say you index the term Erick. Your index may have (and I
don't remember what the actual output of Beider is) something
totally transformed like MNUA. But your query does NOT do the
transformation, so the query is looking for Erick. Obviously it
isn't found.

I _strongly_ advise you to take some time to get familiar with the
admin/analysis page, that'll shed light on a _lot_ of analysis issues.

Best,
Erick



On Wed, Feb 12, 2014 at 4:26 AM, Navaa 
navnath.thomb...@xtremumsolutions.com wrote:

 Hi,
 I am beginner of solr,
 I am trying to implement phonetic search in my application
 my code in schema.xml for fieldType

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=0
 generateWordParts=1 stemEnglishPossessive=0
 generateNumberParts=0
 catenateWords=1 catenateNumbers=0 catenateAll=0
 preserveOriginal=1/
 /analyzer
 analyzer type=query

 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 And 

 fieldType name=text_general_phonetic class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=0
 generateWordParts=1 stemEnglishPossessive=0
 generateNumberParts=0
 catenateWords=1 catenateNumbers=0 catenateAll=0
 preserveOriginal=1/
 filter class=solr.BeiderMorseFilterFactory nameType=GENERIC
 ruleType=APPROX concat=true languageSet=auto/

 /analyzer
 analyzer type=query

 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType



 AND field definition

 field name=fname type=text_general indexed=true stored=true
 required=false multiValued=false/
 field name=fname_copy type=text_general_phonetic indexed=true
 stored=true required=false /
 copyfield source=fname dest=fname_copy/


 when I am search stephen, stifn will gives me stephen but it wont works...
 Also if how can I use phonetic filter with DoubleMetaphone encoder..
 Please help me
 Thanks in Advance.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Phonetic-search-on-multiple-fields-tp4116876.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing database DIH

2014-02-12 Thread Maheedhar Kolla

Thanks for the comments/advice. I did mess with the drivers ( by
deliberately moving the libs) and it did fail as it is supposed to.

When I looked into catalina.out, I realized that the problem lies with
data directory being owned by root instead of tomcat6.  I changed it
so that tomcat6 can write to data directory and now I see a different
error ( after processing 100+ rows) .  But, am happy that the initial
problem is gone. I should be able to fix these minor ones.

Thanks for the advice again.

Cheers,
Kolla

PS: I think I was looking into the wrong logs before :/



On Wed, Feb 12, 2014 at 10:31 AM, Pisarev, Vitaliy
vitaliy.pisa...@hp.com wrote:
 It can be anything from wrong credentials, to missing driver in the class 
 path, to malformed connection string, etc..

 What does the Solr log say?

 -Original Message-
 From: Maheedhar Kolla [mailto:maheedhar.ko...@gmail.com]
 Sent: יום ד 12 פברואר 2014 17:23
 To: solr-user@lucene.apache.org
 Subject: Importing database DIH

 Hi ,


 I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

  I see the following error when I try to do a full-import from my local MySQL 
 table ( http:/s/solr//dataimport?command=full-import
 ).

 snip
 ..
 str name=Total Requests made to DataSource0/str str name=Total Rows 
 Fetched0/str str name=Total Documents Processed0/str str 
 name=Total Documents Skipped0/str str name=Indexing failed. Rolled 
 back all changes./str 
 /snip

 I did search to find ways to solve this problem and did create the file 
 dataimport.properties , but no success.

 Any help would be appreciated.


 cheers,
 Kolla

 PS:  When I check the admin panel for statistics for the query /dataimport , 
 I see the following:

 Status : IDLE
 Documents Processed : 0
 Requests made to DataSource : 0
 Rows Fetched : 0
 Documents Deleted : 0
 Documents Skipped : 0
 Total Documents Processed : 0
 Total Requests made to DataSource : 0
 Total Rows Fetched : 0
 Total Documents Deleted : 0
 Total Documents Skipped : 0
 handlerStart : 1391612468278
 requests : 5
 errors : 0
 timeouts : 0
 totalTime : 28
 avgTimePerRequest : 5.6


 Also, Here is my dataconfig file.
 dataConfig
  dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost:3306/DBNAME
 user=USER password=PWD/  document
  entity name=docs query=select * from TABLENAME
 field column=id name=id/
 field column=content name=text/
 field column=title name=title/
  /entity
  /document





 --
 Cheers,
 Kolla



-- 
Cheers,
Kolla

Re: Question about how to upload XML by using SolrJ Client Java Code

Hmmm, before going there let's be sure you're trying to do
what you think you are.

Solr does _not_ index arbitrary XML. There is a very
specific format of XML that describes solr documents
that _can_ be indexed. But random XML is not
supported. See the documents in example/exampledocs
for the XML form of Solr docs.

So if you have arbitrary XML, you need to parse it and then
construct Solr documents. One way would be to use
SolrJ, parse the docs using your favorite Java parser and
construct SolrInputDocuments which you then use one of
the SolrServer classes (e.g. CloudSolrServer) to add to the index.

There really is no Solr MXL Parser that I know of, Solr just
uses one of the standard XML parsers (e.g. sax)...

Best,
Erick

On Wed, Feb 12, 2014 at 7:21 AM, Eric_Peng sagittariuse...@gmail.comwrote:

I was just trying to use SolrJ Client to import XML data to Solr server.
And
I read SolrJ wiki that says SolrJ lets you upload content in XML and
Binary
format

I realized there is a XML parser in Solr (We can use a dataUpadateHandler
in
Solr default UI Solr Core Dataimport)

So I was wondering how to directly use solr xml parser to upload xml by
using SolrJ Java Code? I could use other open-source xml parser, But I
really want to know if there is a way to call Solr parser library.

Would you mind send me a simple code if possible, really appreciated.
Thanks in advance.

solr/4.6.1

--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about how to upload XML by using SolrJ Client Java Code

On 2/12/2014 8:21 AM, Eric_Peng wrote:
  I was just trying to use SolrJ Client to import XML data to Solr server. And
 I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary
 format 
 
 I realized there is a XML parser in Solr (We can use a dataUpadateHandler in
 Solr default UI Solr Core Dataimport)
 
 So I was wondering how to directly use solr xml parser to upload xml by
 using SolrJ Java Code? I could use other open-source xml parser, But I
 really want to know if there is a way to call Solr parser library.
 
 Would you mind send me a simple code if possible, really appreciated.
 Thanks in advance.
 
 solr/4.6.1

When the docs say that SolrJ lets you upload data in XML and binary
format, what they actually mean is that SolrJ will create an update
request that is formatted using XML, not that it will let you send
arbitrary XML data.  It is referring to the specific XML format shown here:

http://wiki.apache.org/solr/UpdateXmlMessages#add.2Freplace_documents

As for an XML parser ... SolrJ's XMLResponseParser is a class that
accepts XML *responses* from Solr and translates them into the Java
response object.  There is also BinaryResponseParser.

The only things that I am aware of in Solr that will deal with XML as
the data source are the XPathEntityProcessor in the dataimport handler
and the ExtractingRequestHandler which uses Apache Tika.  Both of these
are actually contrib modules -- jar files for these features are in the
download, but not built into Solr or SolrJ.

If you are using the extracting request handler, you could probably use
the DirectXmlRequest object, where 'xml' is a String with the xml in it:

  DirectXmlRequest req = new DirectXmlRequest( /update/extract, xml );
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.set(someParam, someValue);
  req.setParams(params);
  NamedListObject response = solrServer.request(req);

I hope that you are right and there actually is an XML parser built into
SolrJ.  We would both learn something.

Thanks,
Shawn

Re: Newb - Search not returning any results

2014-02-12 Thread leevduhl

Thanks the syntax correction solved the problem.  I actually thought I tried
that before I posted.

Thanks
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newb-Search-not-returning-any-results-tp4116905p4116930.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

Here's some additional background that may shed light on the
performance..

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan solrexp...@gmail.com wrote:

Cross-posting my answer from SO:

According to this wiki:

https://wiki.apache.org/solr/NearRealtimeSearch

the commitWithin is a soft-commit by default. Soft-commits are very
efficient in terms of making the added documents immediately searchable.
But! They are not on the disk yet. That means the documents are being
committed into RAM. In this setup you would use updateLog to be solr
instance crash tolerant.

What you do in point 2 is hard-commit, i.e. flush the added documents to
disk. Doing this after each document add is very expensive. So instead,
post a bunch of documents and issue a hard commit or even have you
autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
your user expectations).

On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
wrote:

I absolutely agree and I even read the NRT page before posting this
question.

The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms
delay,- I expect that this behavior will be equivalent to making a
commit-
from a functional perspective.

Seeing that there is no magic in the world, I am trying to understand
what
is the price I am actually paying when using the commitWithin feature, on
the one hand it commits almost immediately, on the other hand, it
performs
wonderfully. Where is the catch?

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be
true.
I am afraid I am missing something

Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and
not a standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
wrote:

I am running a very simple performance experiment where I post 2000
documents to my application. Who in turn persists them to a relational DB
and sends them to Solr for indexing (Synchronously, in the same request).
I am testing 3 use cases:

1. No indexing at all - ~45 sec to post 2000 documents 2. Indexing
included - commit after each add. ~8 minutes (!) to post and index
2000 documents 3. Indexing included - commitWithin 1ms ~55 seconds
(!) to post and index 2000 documents The 3rd result does not make any
sense, I would expect the behavior to be similar to the one in point 2.
At
first I thought that the documents were not really committed but I could
actually see them being added by executing some queries during the
experiment (via the solr web UI).
I am worried that I am missing something very big. The code I use for
point 2:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc);
solrConnection.commit(); Whereas the code for point 3:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc,
1); // According to API documentation I understand there is no need to
explicitly call commit with this API Is it possible that committing
after each add will degrade performance by a factor of 40?

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

The explicit commit will cause your app to be delayed until that commit 
completes, and then Solr would be idle until that request completion makes 
its way back to your app and you submit another request which finds its way 
to Solr, maybe a few ms. That includes network latency. That interval of 
time could well be more than enough for the short-interval autoCommit or 
commitWithin to run in the background and in parallel with the request 
return to your app and the submission by your app of the subsequent request.


The magic of asynchronous operation in a parallel and distributed computing 
environment, coupled with multi-core processors and parallel threads.


-- Jack Krupansky

-Original Message- 
From: Pisarev, Vitaliy

Sent: Wednesday, February 12, 2014 10:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr perfromance with commitWithin seesm too good to be true. I 
am afraid I am missing something


I absolutely agree and I even read the NRT page before posting this 
question.


The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms 
delay,- I expect that this behavior will be equivalent to making a commit- 
from a functional perspective.


Seeing that there is no magic in the world, I am trying to understand what 
is the price I am actually paying when using the commitWithin feature, on 
the one hand it commits almost immediately, on the other hand, it performs 
wonderfully. Where is the catch?



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be true. I 
am afraid I am missing something


Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not 
a standard commit.


https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com 
wrote:


I am running a very simple performance experiment where I post 2000 
documents to my application. Who in turn persists them to a relational DB 
and sends them to Solr for indexing (Synchronously, in the same request).

I am testing 3 use cases:

 1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
included - commit after each add. ~8 minutes (!) to post and index
2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
(!) to post and index 2000 documents The 3rd result does not make any 
sense, I would expect the behavior to be similar to the one in point 2. At 
first I thought that the documents were not really committed but I could 
actually see them being added by executing some queries during the 
experiment (via the solr web UI).
I am worried that I am missing something very big. The code I use for 
point 2:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc);
solrConnection.commit(); Whereas the code for point 3:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc,
1); // According to API documentation I understand there is no need to
explicitly call commit with this API Is it possible that committing after 
each add will degrade performance by a factor of 40?

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng

Thanks a lot, learnt a lot from it



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116937.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng

Thanks you so much Erick, I will try to write my owe XML parser




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116936.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using numeric ranges in Solr query

2014-02-12 Thread Rafał Kuć

Hello!

Just specify the left boundary, like: price:[900 TO 1000]

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
 items with price around 1000 USD. I found in documentation that i can use
 price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
 But i want to get results where price is between 900 to 1000 USD. 

 Any help is appreciated.

 Thank You.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about how to upload XML by using SolrJ Client Java Code

There is also an XSLT update handler option to transform raw XML to Solr 
XML on the fly. If anybody here has used it, feel free to chime in.


See:
http://wiki.apache.org/solr/XsltUpdateRequestHandler
and
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-UsingXSLTtoTransformXMLIndexUpdates

-- Jack Krupansky

-Original Message- 
From: Eric_Peng

Sent: Wednesday, February 12, 2014 11:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about how to upload XML by using SolrJ Client Java 
Code


Thanks you so much Erick, I will try to write my owe XML parser




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116936.html
Sent from the Solr - User mailing list archive at Nabble.com.

Using numeric ranges in Solr query

2014-02-12 Thread jay67

When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
items with price around 1000 USD. I found in documentation that i can use
price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
But i want to get results where price is between 900 to 1000 USD. 

Any help is appreciated.

Thank You.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using numeric ranges in Solr query


Is price a float/double field?

price:[99.5 TO 100.5]  -- price near 100

price:[900 TO 1000]

or

price:[899.5 TO 1000.5]

-- Jack Krupansky

-Original Message- 
From: jay67

Sent: Wednesday, February 12, 2014 12:03 PM
To: solr-user@lucene.apache.org
Subject: Using numeric ranges in Solr query

When user enter a price in price field, for Ex: 1000 USD, i want to fetch 
all

items with price around 1000 USD. I found in documentation that i can use
price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
But i want to get results where price is between 900 to 1000 USD.

Any help is appreciated.

Thank You.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Set up embedded Solr container and cores programmatically to read their configs from the classpath

2014-02-12 Thread Alan Woodward

Hi Robert,

I don't think this is possible at the moment, but I hope to get 
https://issues.apache.org/jira/browse/SOLR-4478 in for Lucene/Solr 4.7, which 
should allow you to inject your own SolrResourceLoader implementation for core 
creation (it sounds as though you want to wrap the core's loader in a 
ClasspathResourceLoader).  You could try applying that patch to your setup and 
see if that helps you out.

Alan Woodward
www.flax.co.uk


On 11 Feb 2014, at 10:41, Robert Krüger wrote:

 Hi,
 
 I have an application with an embedded Solr instance (and I want to
 keep it embedded) and so far I have been setting up my Solr
 installation programmatically using folder paths to specify where the
 specific container or core configs are.
 
 I have used the CoreContainer methods createAndLoad and create using
 File arguments and this works fine. However, now I want to change this
 so that all configuration files are loaded from certain locations
 using the classloader but I have not been able to get this to work.
 
 E.g. I want to have my solr config located in the classpath at
 
 my/base/package/solr/conf
 
 and the core configs at
 
 my/base/package/solr/cores/core1/conf,
 my/base/package/solr/cores/core2/conf
 
 etc..
 
 Is this possible at all? Looking through the source code it seems that
 specifying classpath resources in such a qualified way is not
 supported but I may be wrong.
 
 I could get this to work for the container by supplying my own
 implementation of SolrResourceLoader that allows a base path to be
 specified for the resources to be loaded (I first thought that would
 happen already when specifying instanceDir accordingly but looking at
 the code it does not. for resources loaded through the classloader,
 instanceDir is not prepended). However then I am stuck with the
 loading of the cores' resources as the respective code (see
 org.apache.solr.core.CoreContainer#createFromLocal) instantiates a
 SolResourceLoader internally.
 
 Thanks for any help with this (be it a clarification that it is not possible).
 
 Robert

RE: Solr4 performance

2014-02-12 Thread Joshi, Shital

Does Solr4 load entire index in Memory mapped file? What is the eviction policy 
of this memory mapped file? Can we control it?

_
From: Joshi, Shital [Tech]
Sent: Wednesday, February 05, 2014 12:00 PM
To: 'solr-user@lucene.apache.org'
Subject: Solr4 performance

Hi,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes 
(cloud). We're using local disk (/local/data) to store solr index files. All 
hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far 
we have 470 million documents. We are using custom sharding and all shards have 
~9-10 million documents. We have a GUI sending queries to this cloud and GUI 
has 30 seconds of timeout.

Lately we're getting many timeouts on GUI and upon checking we found that all 
timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 96% 
of physical memory but the other host looks perfectly good. Both hosts are for 
different shards. Would increasing ram of these two hosts make these timeouts 
go away? What else we can check?

Many Thanks!

Re: Solr4 performance

2014-02-12 Thread Shalin Shekhar Mangar

No, Solr doesn't load the entire index in memory. I think you'll find
Uwe's blog most helpful on this matter:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

On Thu, Feb 13, 2014 at 12:27 AM, Joshi, Shital shital.jo...@gs.com wrote:
 Does Solr4 load entire index in Memory mapped file? What is the eviction 
 policy of this memory mapped file? Can we control it?

 _
 From: Joshi, Shital [Tech]
 Sent: Wednesday, February 05, 2014 12:00 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr4 performance


 Hi,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
 boxes (cloud). We're using local disk (/local/data) to store solr index 
 files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap 
 size. So far we have 470 million documents. We are using custom sharding and 
 all shards have ~9-10 million documents. We have a GUI sending queries to 
 this cloud and GUI has 30 seconds of timeout.

 Lately we're getting many timeouts on GUI and upon checking we found that all 
 timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 
 96% of physical memory but the other host looks perfectly good. Both hosts 
 are for different shards. Would increasing ram of these two hosts make these 
 timeouts go away? What else we can check?

 Many Thanks!





-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr4 performance

2014-02-12 Thread Greg Walters

Shital,

Take a look at 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's 
a pretty decent explanation of memory mapped files. I don't believe that the 
default configuration for solr is to use MMapDirectory but even if it does my 
understanding is that the entire file won't be forcibly cached by solr. The 
OS's filesystem cache should control what's actually in ram and the eviction 
process will depend on the OS.

Thanks,
Greg

On Feb 12, 2014, at 12:57 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Does Solr4 load entire index in Memory mapped file? What is the eviction 
 policy of this memory mapped file? Can we control it?
 
 _
 From: Joshi, Shital [Tech]
 Sent: Wednesday, February 05, 2014 12:00 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr4 performance
 
 
 Hi,
 
 We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
 boxes (cloud). We're using local disk (/local/data) to store solr index 
 files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap 
 size. So far we have 470 million documents. We are using custom sharding and 
 all shards have ~9-10 million documents. We have a GUI sending queries to 
 this cloud and GUI has 30 seconds of timeout.
 
 Lately we're getting many timeouts on GUI and upon checking we found that all 
 timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 
 96% of physical memory but the other host looks perfectly good. Both hosts 
 are for different shards. Would increasing ram of these two hosts make these 
 timeouts go away? What else we can check?
 
 Many Thanks!

RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)

Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

private void addSpatialField(double lat, double lon, SolrInputDocument 
document) {
   StringBuilder sb = new StringBuilder();
   sb.append(lat).append(,).append(lon);
   document.addField(location, sb.toString());
}

Using Solr 4.3.1 and spatial4j 0.3, I am getting the following error in the 
solr logs:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)

spatial4j 0.3 is looking for something like POINT but Solr is converting my 
lat,long to Pt(x=-72.544123,y=41.85).

Version mismatch?

Thanks in advance for your help!

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Monday, January 13, 2014 11:30 AM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

Hello Jim,

By the way, using GeohashPrefixTree.getMaxLevelsPossible() is usually an 
extreme choice.  Instead you probably want to choose only as many levels needed 
for your distance tolerance.  See SpatialPrefixTreeFactory which you can use 
outright or borrow the code it uses.

Looking at your code, I see you are coding against Solr directly in Java 
instead of where most people do this as an HTTP web service.  But it's unclear 
in what context your code runs because you are getting into the guts of things 
that you normally don't have to do, even if your using SolrJ or writing some 
sort of UpdateRequestProcessor.  Simply configure the field type in schema.xml 
appropriately, and then to index a point simply give Solr a string for the 
field in latitude, longitude format.  I don't know why you are using 
field.tokenStream(analyzer) for the field value - that is clearly wrong and the 
cause of the error.  I think your confusion more has to do with differences in 
coding to Lucene versus Solr;  this being an actual spatial concern.  You 
referenced SpatialDemoUpdateProcessorFactory so I see you have looked at 
SolrSpatialSandbox on GitHub.  That particular URP should get some warnings 
added to it in the code to suggest that you probably should do what it does.  
If you look at the solrconfig.xml that configures it, there is a warning as 
follows:
  !-- spatial
   Only needed for an SpatialDemoUpdateProcessorFactory which copies  spatial 
objects from one field
to other spatial fields in object form to avoid redundant/inefficient 
string to spatial object de-serialization.
   --
Even if you have a similar circumstance, you're code doesn't quite look like 
this URP.  You shouldn't need to reference the SpatialStrategy, for example.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Friday, January 10, 2014 at 12:15 PM
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Cc: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Indexing spatial fields into SolrCloud (HTTP)

I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()), 
DocumentFieldNames.LOCATION);


   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  document.add(field);
   }

   //Store the field
   document.add(new StoredField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point)));

Solr version:

   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  try {
 solrDocument.addField(field.name(), 
field.tokenStream(analyzer));
  } catch (IOException e) {
 LOGGER.error(Failed to add geo field to Solr index, e);
  }
   }

   // Store the field
   solrDocument.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));

The server-side error is as follows:

Caused by: com.spatial4j.core.exception.InvalidShapeException: Unable to read:

filtering/faceting by a big list IDs

2014-02-12 Thread Tri Cao

Hi all,I am running a Solr application and I would need to implement a feature that requires faceting and filtering on a large list of IDs. The IDs are stored outside of Solr and is specific to the current logged on user. An example of this is the articles/tweets the user has read in the last few weeks. Note that the IDs here are the real document IDs and not Lucene internal docids.So the question is what would be the best way to implement this in Solr? The list could be as large as a ten of thousands of IDs. The obvious way of rewriting Solr query to add the ID list as "facet.query" and "fq" doesn't seem to be the best way because: a) the query would be very long, and b) it would surely exceed thatthe default limit of 1024 Boolean clauses and I am sure the limit is there for a reason.I had a similar problem before but back then I was using Lucene directly and the way I solved it is to use a MultiTermQuery to retrieve the internal docids from the ID list and then apply the resulting DocSet to counting and filtering. It was working reasonably for list of size ~10K, and with proper caching, it was working ok. My current application is very invested in Solr that going back to Lucene is not an option anymore.All advice/suggestion are welcomed.Thanks,Tri

Re: Solr4 performance


On 2/12/2014 12:07 PM, Greg Walters wrote:

Take a look at 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's 
a pretty decent explanation of memory mapped files. I don't believe that the 
default configuration for solr is to use MMapDirectory but even if it does my 
understanding is that the entire file won't be forcibly cached by solr. The 
OS's filesystem cache should control what's actually in ram and the eviction 
process will depend on the OS.


I only have a little bit to add.  Here's the first thing that Uwe's blog 
post (linked above) says:


Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby 
default on 64bit Windows and Solaris systems; since version 3.3 also for 
64bit Linux systems.


The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory 
by default under the hood.


A summary about all this that should be relevant to the original question:

It's the *operating system* that handles memory mapping, including any 
caching that happens.  Assuming that you don't have a badly configured 
virtual machine setup, I'm fairly sure that only real memory gets used, 
never swap space on the disk.  If something else on the system makes a 
memory allocation, the operating system will instantly give up memory 
used for caching and mapping.  One of the strengths of mmap is that it 
can't exceed available resources unless it's used incorrectly.


Thanks,
Shawn

Re: filtering/faceting by a big list IDs

2014-02-12 Thread Joel Bernstein

Tri,

You will most likely need to implement a custom QParserPlugin to
efficiently handle what you described. Inside of this QParserPlugin you
could create the logic that would bring in your outside list of ID's and
build a DocSet that could be applied to the fq and the facet.query. I
haven't attempted to use a QParserPlugin with a facet.query, but in theory
it would work.

With the filter query you also have the option of implementing your Query
as a PostFilter. PostFilter logic is applied at collect time so the logic
needs to only be applied to the documents that match the query. In many
cause this can be faster, especially when result sets are relatively small
but the index is large.


Joel Bernstein
Search Engineer at Heliosearch


On Wed, Feb 12, 2014 at 2:12 PM, Tri Cao tm...@me.com wrote:

 Hi all,

 I am running a Solr application and I would need to implement a feature
 that requires faceting and filtering on a large list of IDs. The IDs are
 stored outside of Solr and is specific to the current logged on user. An
 example of this is the articles/tweets the user has read in the last few
 weeks. Note that the IDs here are the real document IDs and not Lucene
 internal docids.

 So the question is what would be the best way to implement this in Solr?
 The list could be as large as a ten of thousands of IDs. The obvious way of
 rewriting Solr query to add the ID list as facet.query and fq doesn't
 seem to be the best way because: a) the query would be very long, and b) it
 would surely exceed that the default limit of 1024 Boolean clauses and I
 am sure the limit is there for a reason.

 I had a similar problem before but back then I was using Lucene directly
 and the way I solved it is to use a MultiTermQuery to retrieve the internal
 docids from the ID list and then apply the resulting DocSet to counting and
 filtering. It was working reasonably for list of size ~10K, and with proper
 caching, it was working ok. My current application is very invested in Solr
 that going back to Lucene is not an option anymore.

 All advice/suggestion are welcomed.

 Thanks,
 Tri

Re: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Smiley, David W.

That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get abbreviated.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 2:05 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

privatevoid addSpatialField(double lat, double lon, SolrInputDocument document) 
{
   StringBuilder sb = new StringBuilder();
   sb.append(lat).append(,).append(lon);
   document.addField(location, sb.toString());
}

Using Solr 4.3.1 and spatial4j 0.3, I am getting the following error in the 
solr logs:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)

spatial4j 0.3 is looking for something like POINT but Solr is converting my 
“lat,long” to Pt(x=-72.544123,y=41.85).

Version mismatch?

Thanks in advance for your help!

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Monday, January 13, 2014 11:30 AM
To: Beale, Jim (US-KOP); 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

Hello Jim,

By the way, using GeohashPrefixTree.getMaxLevelsPossible() is usually an 
extreme choice.  Instead you probably want to choose only as many levels needed 
for your distance tolerance.  See SpatialPrefixTreeFactory which you can use 
outright or borrow the code it uses.

Looking at your code, I see you are coding against Solr directly in Java 
instead of where most people do this as an HTTP web service.  But it’s unclear 
in what context your code runs because you are getting into the guts of things 
that you normally don’t have to do, even if your using SolrJ or writing some 
sort of UpdateRequestProcessor.  Simply configure the field type in schema.xml 
appropriately, and then to index a point simply give Solr a string for the 
field in “latitude, longitude” format.  I don’t know why you are using 
field.tokenStream(analyzer) for the field value — that is clearly wrong and the 
cause of the error.  I think your confusion more has to do with differences in 
coding to Lucene versus Solr;  this being an actual spatial concern.  You 
referenced “SpatialDemoUpdateProcessorFactory” so I see you have looked at 
SolrSpatialSandbox on GitHub.  That particular URP should get some warnings 
added to it in the code to suggest that you probably should do what it does.  
If you look at the solrconfig.xml that configures it, there is a warning as 
follows:
  !-- spatial
   Only needed for an SpatialDemoUpdateProcessorFactory which copies  spatial 
objects from one field
to other spatial fields in object form to avoid redundant/inefficient 
string to spatial object de-serialization.
   --
Even if you have a similar circumstance, you’re code doesn’t quite look like 
this URP.  You shouldn’t need to reference the SpatialStrategy, for example.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Friday, January 10, 2014 at 12:15 PM
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Cc: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Indexing spatial fields into SolrCloud (HTTP)

I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()),

Re: Solr4 performance

2014-02-12 Thread Roman Chyla

And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.

Best,

  roman


On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/12/2014 12:07 PM, Greg Walters wrote:

 Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
 on-64bit.html as it's a pretty decent explanation of memory mapped
 files. I don't believe that the default configuration for solr is to use
 MMapDirectory but even if it does my understanding is that the entire file
 won't be forcibly cached by solr. The OS's filesystem cache should control
 what's actually in ram and the eviction process will depend on the OS.


 I only have a little bit to add.  Here's the first thing that Uwe's blog
 post (linked above) says:

 Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default
 on 64bit Windows and Solaris systems; since version 3.3 also for 64bit
 Linux systems.

 The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory
 by default under the hood.

 A summary about all this that should be relevant to the original question:

 It's the *operating system* that handles memory mapping, including any
 caching that happens.  Assuming that you don't have a badly configured
 virtual machine setup, I'm fairly sure that only real memory gets used,
 never swap space on the disk.  If something else on the system makes a
 memory allocation, the operating system will instantly give up memory used
 for caching and mapping.  One of the strengths of mmap is that it can't
 exceed available resources unless it's used incorrectly.

 Thanks,
 Shawn

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht

Navaa,

you need query expansion for that.
E.g. if your query goes through dismax, you need to add the two field names to 
the qf parameter.
The nice thing is that qf can be:
  text^3.0 test.stemmed^2 text.phonetic^1
And thus exact matches are preferred to stemmed or phonetic matches.

This is configured in solrconfig.xml.
It's quite common to create your own query component to do more than just 
dismax for this.

hope it helps.

paul


Le 12 févr. 2014 à 14:22, Navaa navnath.thomb...@xtremumsolutions.com a écrit 
:

 Hi,
 I am using solr for searching phoneticly equivalent string
 my schema contains...
 fieldType name=text_general_doubleMetaphone class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=DoubleMetaphone
 inject=true/
   filter class=solr.LowerCaseFilterFactory/ 
   /analyzer
   analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=DoubleMetaphone
 inject=true/
   filter class=solr.LowerCaseFilterFactory/ 
   /analyzer
 /fieldType
 
 and fields are
 
 field name=fname type=text_general indexed=true stored=true
 required=false /
 field name=fname_sound type=text_general_doubleMetaphone indexed=true
 stored=true required=false /
   copyField source=fname dest=fname_sound /
 
 
 it works when i search stfn=== stephen, stephn 
 
 But I am expecting stephn= stephen like 
 How I will get this result. m I doing something wrong
 
 Thanks in advance
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Searching-phonetic-by-DoubleMetaphone-soundex-encoder-tp4116885.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimize Index in solr 4.6


On 2/6/2014 4:00 AM, Shawn Heisey wrote:

I would not recommend it, but if you know for sure that your
infrastructure can handle it, then you should be able to optimize them
all at once by sending parallel optimize requests with distrib=false
directly to the Solr cores that hold the shard replicas, not the collection.


Followup on this thread:

Evidence now suggests (thank you, Yago!) that sending an optimize 
request with distrib=false might *NOT* optimize just the core that 
receives the request.  I can confirm that this is the case on a 
SolrCloud 4.2.1 setup with one shard and replicationFactor=2.  It 
optimized that core, then when that was finished, optimized the other 
replica.


I would have already filed an issue in Jira, except that I do not 
currently have any way to test this on 4.6.1, so I do not know if this 
is still the way it works.  Also, I do not have a distributed SolrCloud 
index available.  I will be looking into writing a unit test, but my 
grasp of SolrCloud tests is very weak.


Thanks,
Shawn

RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)

Hi David,

You wrote:

 Perhaps you’ve got some funky UpdateRequestProcessor from experimentation 
 you’ve done that’s parsing then toString’ing it?

No, nothing at all.  The update processing is straight out-of-the-box Solr.

 And also, your stack trace should have more to it than what you present here.

I trimmed the stack trace because it seemed like TMI, but here it is for 
completeness’ sake:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)
at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:186)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:257)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
…..

 Your SolrJ code looks totally fine.

Finally, I changed the code to the following,

private void addSpatialLcnFields(double lat, double lon, SolrInputDocument 
document) {
   Point point = geoSpatialCtx.makePoint(lon, lat);
   document.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));
}

and at least it isn’t throwing exceptions now.  I’m not sure what is going into 
the index yet.  I’ll have to wait for it to finish.  (

Does that code seem correct?  I want to avoid the deprecated API but so far I 
haven’t found any alternatives which work.

Thanks,

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, February 12, 2014 3:07 PM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get abbreviated.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 2:05 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

privatevoid addSpatialField(double lat, double lon,

Re: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Smiley, David W.

Your new code should also work, and should be equivalent.

The longer stack trace you have is of the wrapping SolrException which wraps 
another exception — InvalidShapeException.  You should also see the stack trace 
of InvalidShapeException which should originate out of Spatial4j.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 5:21 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

You wrote:

 Perhaps you’ve got some funky UpdateRequestProcessor from experimentation 
 you’ve done that’s parsing then toString’ing it?

No, nothing at all.  The update processing is straight out-of-the-box Solr.

 And also, your stack trace should have more to it than what you present here.

I trimmed the stack trace because it seemed like TMI, but here it is for 
completeness’ sake:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)
at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:186)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:257)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
…..

 Your SolrJ code looks totally fine.

Finally, I changed the code to the following,

privatevoid addSpatialLcnFields(double lat, double lon, SolrInputDocument 
document) {
   Point point = geoSpatialCtx.makePoint(lon, lat);
   document.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));
}

and at least it isn’t throwing exceptions now.  I’m not sure what is going into 
the index yet.  I’ll have to wait for it to finish.  (

Does that code seem correct?  I want to avoid the deprecated API but so far I 
haven’t found any alternatives which work.

Thanks,

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, February 12, 2014 3:07 PM
To: Beale, Jim (US-KOP); 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get

Weird issue with q.op=AND

2014-02-12 Thread Shamik Bandopadhyay

Hi,

  I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.

lst name=appends
   str name=fqSource:TestHelp/str
/lst

Now, the moment I add an additional parameter, search stops returning any
result.

lst name=appends
   str name=fqSource:TestHelp | Source:TestHelp2/str
/lst

If I remove q.op=AND from request handler, I get results back. Data is
present for both the Source I'm using, so it's not a filtering issue. Even
a blank query fails to return data.

Here's my request handler.

requestHandler name=/testhandler class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.contentTypetext/html;charset=UTF-8/str
str name=v.layoutlayout/str
str name=v.channeltesthandler/str
str name=defTypeedismax/str
str name=q.opAND/str
str name=q.alt*:*/str
str name=rows15/str
str name=flid,url,Source2,text/str
str name=qftext^1.5 title^2/str
str name=bqSource:TestHelp^3 Source:TestHelp2^0.85/str
str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str
str name=dftext/str

!-- facets --
str name=faceton/str
str name=facet.mincount1/str
str name=facet.limit100/str
str name=facet.fieldlanguage/str
str name=facet.fieldSource/str
 !-- Highlighting defaults --
str name=hltrue/str
str name=hl.fltext title/str
str name=f.text.hl.fragsize250/str
str name=f.text.hl.alternateFieldShortDesc/str

!-- Spell check settings --
str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str

   !-- Shard Tolerant --
str name=shards.toleranttrue/str
/lst
lst name=appends
str name=fqSource:TestHelp | Source2:TestHelp2/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

Not sure what's going wrong. I'm using a SolrCloud environment with 2
shards having a replica each.

Any pointers will be appreciated.

Thanks,
Shamik

Re: Weird issue with q.op=AND


On 2/12/2014 3:32 PM, Shamik Bandopadhyay wrote:

Hi,

   I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.

lst name=appends
str name=fqSource:TestHelp/str
/lst

Now, the moment I add an additional parameter, search stops returning any
result.

lst name=appends
str name=fqSource:TestHelp | Source:TestHelp2/str
/lst

If I remove q.op=AND from request handler, I get results back. Data is
present for both the Source I'm using, so it's not a filtering issue. Even
a blank query fails to return data.


I'm pretty sure that's not valid Solr query syntax for what you're 
trying to do.  Try this instead, although with these specific examples I 
would leave the quotes out of the fq value:


lst name=appends
   str name=fqSource:(TestHelp OR TestHelp2)/str
/lst

It's pretty much accidental (a result of the query analysis chain) that 
it was working when you didn't have q.op=AND.  You can verify what I'm 
saying by looking at the parsed query when adding debugQuery=true as a 
query option.


Thanks,
Shawn

Re: Weird issue with q.op=AND

2014-02-12 Thread shamik

Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).


lst name=appends 
 str name=fqSource:TestHelp | Source:downloads |
-AccessMode:internal | -workflowparentid:[* TO *]/str 
/lst

This has been working as expected and applies the filter correctly. Just
curious, if its an invalid syntax, how's Solr handling this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Weird issue with q.op=AND


On 2/12/2014 4:58 PM, shamik wrote:

Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).


lst name=appends
  str name=fqSource:TestHelp | Source:downloads |
-AccessMode:internal | -workflowparentid:[* TO *]/str
/lst

This has been working as expected and applies the filter correctly. Just
curious, if its an invalid syntax, how's Solr handling this ?


Honestly, I can't really say what's going on here.  After I got this, I 
tried some example queries like that and they do seem to be parsed 
right.  You could try adding turning on debugQuery for the query that 
doesn't work and see if you can see what the problem is.  I had never 
seen a query syntax with | in it before.  The other syntax is a little 
more explicit, though.


Thanks,
Shawn

change character correspondence in icu lib

2014-02-12 Thread alxsss

Hello,

I use
icu4j-49.1.jar,
lucene-analyzers-icu-4.6-SNAPSHOT.jar

for one of the fields in the form 

filter class=solr.ICUFoldingFilterFactory /

I need to change one of the accent char's corresponding letter. I made changes 
to this file

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.

Any ideas where the appropriate change must be made?

Thanks.
Alex.

Re: change character correspondence in icu lib

2014-02-12 Thread Alexandre Rafalovitch

Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  alx...@aim.com wrote:
 Hello,

 I use
 icu4j-49.1.jar,
 lucene-analyzers-icu-4.6-SNAPSHOT.jar

 for one of the fields in the form

 filter class=solr.ICUFoldingFilterFactory /

 I need to change one of the accent char's corresponding letter. I made 
 changes to this file

 lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

 recompiled solr and lucene and replaced the above jars with new ones, but no 
 change in the indexing and parsing of keywords.

 Any ideas where the appropriate change must be made?

 Thanks.
 Alex.

Re: Weird issue with q.op=AND

2014-02-12 Thread shamik

Thanks, I'll take a look at the debug data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117047.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing strategies?

2014-02-12 Thread manju16832003

Hi Erick,
Thank you very much, those are valuable suggestions :-).
I would give a try.
Appreciate your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852p4117050.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Weird issue with q.op=AND