Indexing strategies?

2014-02-12 Thread manju16832003
Hi,
I'm facing a dilemma of choosing the indexing strategies.
My application architecture is 
 - I have a listing table in my DB
 - For each listing, I have 3 calls to a URL Datasource of different system

 I have 200k records

 Time taken to index 25 docs is 1Minute, so for 200k it might take more than
100hrs :-(?

 
 I know there are lot of factors to consider from Network to DB.
I'm looking for different strategies that we could perform index.

 - Can we run multiple data import handlers? one data-config for first 100k
and second one is for another 100k
 - Would it be possible to write java service using SolrJ and perform
multi-threaded calls to Solr to Index?
 - The URL Datasources i'm using is actually resided in MSSQL database of
different system. Could I be able to fasten indexing time if I just could
use JDBCDataSource that calls DB directly instead through API URL data
source?

Is there any other strategies we could use?

Thank you,
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852.html
Sent from the Solr - User mailing list archive at Nabble.com.


Phonetic search on multiple fields

2014-02-12 Thread Navaa
Hi, 
I am beginner of solr, 
I am trying to implement phonetic search in my application 
my code in schema.xml for fieldType 

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100 
analyzer type=index 
tokenizer class=solr.StandardTokenizerFactory/ 
filter class=solr.LowerCaseFilterFactory/ 
filter class=solr.WordDelimiterFilterFactory 
splitOnCaseChange=1 splitOnNumerics=0 
generateWordParts=1 stemEnglishPossessive=0 
generateNumberParts=0 
catenateWords=1 catenateNumbers=0 catenateAll=0 
preserveOriginal=1/ 
/analyzer 
analyzer type=query 

tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true / 
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/ 
filter class=solr.LowerCaseFilterFactory/ 
/analyzer 
/fieldType 

And  

fieldType name=text_general_phonetic class=solr.TextField 
positionIncrementGap=100 
analyzer type=index 
tokenizer class=solr.StandardTokenizerFactory/ 

filter class=solr.LowerCaseFilterFactory/ 
filter class=solr.WordDelimiterFilterFactory 
splitOnCaseChange=1 splitOnNumerics=0 
generateWordParts=1 stemEnglishPossessive=0 
generateNumberParts=0 
catenateWords=1 catenateNumbers=0 catenateAll=0 
preserveOriginal=1/ 
filter class=solr.BeiderMorseFilterFactory nameType=GENERIC 
ruleType=APPROX concat=true languageSet=auto/ 

/analyzer 
analyzer type=query 

tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true / 
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/ 
filter class=solr.LowerCaseFilterFactory/ 
/analyzer 
/fieldType 



AND field definition 

field name=fname type=text_general indexed=true stored=true 
required=false multiValued=false/ 
field name=fname_copy type=text_general_phonetic indexed=true 
stored=true required=false / 
copyfield source=fname dest=fname_copy/ 


when I am search stephen, stifn will gives me stephen but it wont works... 
Also if how can I use phonetic filter with DoubleMetaphone encoder.. 
Please help me 
Thanks in Advance. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phonetic-search-on-multiple-fields-tp4116876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing in the case of entire shard failure

2014-02-12 Thread elmerfudd
We have a system which consists of 2 shards while every shard has a leader
and one replica.
During indexing one of the shards (both leader and replica) was shut down.
We got two types of HTTP requests: rm= Service Unavailable and rm=OK. 
From this we’ve got to the conclusion that the shard which was up was going
on indexing. Is this behavior correct? We expected that in the case of the
entire shard fails the system stop indexing completely.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-in-the-case-of-entire-shard-failure-tp4116881.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing in the case of entire shard failure

2014-02-12 Thread Jack Krupansky
Note: In SolrCloud terminology, a leader is also a replica. IOW, you have 
two replicas, one of which (and it can vary over time) is elected as leader 
for that shard.


The other shards remain capable of indexing even if one shard becomes 
unavailable. That is expected - and desired - behavior in a fault-tolerant, 
fully-distributed system. Your application can/should make its own decision 
as to what it will do if an indexing operation cannot be serviced.


-- Jack Krupansky

-Original Message- 
From: elmerfudd

Sent: Wednesday, February 12, 2014 7:54 AM
To: solr-user@lucene.apache.org
Subject: Indexing in the case of entire shard failure

We have a system which consists of 2 shards while every shard has a leader
and one replica.
During indexing one of the shards (both leader and replica) was shut down.
We got two types of HTTP requests: rm= Service Unavailable and rm=OK.

From this we’ve got to the conclusion that the shard which was up was going

on indexing. Is this behavior correct? We expected that in the case of the
entire shard fails the system stop indexing completely.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-in-the-case-of-entire-shard-failure-tp4116881.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Navaa
Hi,
I am using solr for searching phoneticly equivalent string
my schema contains...
fieldType name=text_general_doubleMetaphone class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.PhoneticFilterFactory 
encoder=DoubleMetaphone
inject=true/
filter class=solr.LowerCaseFilterFactory/ 
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.PhoneticFilterFactory 
encoder=DoubleMetaphone
inject=true/
filter class=solr.LowerCaseFilterFactory/ 
/analyzer
 /fieldType

and fields are

field name=fname type=text_general indexed=true stored=true
required=false /
field name=fname_sound type=text_general_doubleMetaphone indexed=true
stored=true required=false /
   copyField source=fname dest=fname_sound /


it works when i search stfn=== stephen, stephn 

But I am expecting stephn= stephen like 
How I will get this result. m I doing something wrong

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-phonetic-by-DoubleMetaphone-soundex-encoder-tp4116885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy
I am running a very simple performance experiment where I post 2000 documents 
to my application. Who in turn persists them to a relational DB and sends them 
to Solr for indexing (Synchronously, in the same request).
I am testing 3 use cases:

  1.  No indexing at all - ~45 sec to post 2000 documents
  2.  Indexing included - commit after each add. ~8 minutes (!) to post and 
index 2000 documents
  3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and index 
2000 documents
The 3rd result does not make any sense, I would expect the behavior to be 
similar to the one in point 2. At first I thought that the documents were not 
really committed but I could actually see them being added by executing some 
queries during the experiment (via the solr web UI).
I am worried that I am missing something very big. The code I use for point 2:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection
solrConnection.add(doc);
solrConnection.commit();
Whereas the code for point 3:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection
solrConnection.add(doc, 1); // According to API documentation I understand 
there is no need to explicitly call commit with this API
Is it possible that committing after each add will degrade performance by a 
factor of 40?



Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Mark Miller
Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not a 
standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com wrote:

 I am running a very simple performance experiment where I post 2000 documents 
 to my application. Who in turn persists them to a relational DB and sends 
 them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:
 
  1.  No indexing at all - ~45 sec to post 2000 documents
  2.  Indexing included - commit after each add. ~8 minutes (!) to post and 
 index 2000 documents
  3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and index 
 2000 documents
 The 3rd result does not make any sense, I would expect the behavior to be 
 similar to the one in point 2. At first I thought that the documents were not 
 really committed but I could actually see them being added by executing some 
 queries during the experiment (via the solr web UI).
 I am worried that I am missing something very big. The code I use for point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc);
 solrConnection.commit();
 Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc, 1); // According to API documentation I understand 
 there is no need to explicitly call commit with this API
 Is it possible that committing after each add will degrade performance by a 
 factor of 40?
 



Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Joel Bernstein
Yes, committing after each document will greatly degrade performance. I
typically use autoCommit and autoSoftCommit to set the time interval
between commits, but commitWithin should have a similar effect.. I often
see performance of 2000+ docs per second on the load using auto commits.
When explicitly committing after each document, your commits will happen
too frequently, overworking the indexing process.

Joel Bernstein
Search Engineer at Heliosearch


On Wed, Feb 12, 2014 at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.comwrote:

 I am running a very simple performance experiment where I post 2000
 documents to my application. Who in turn persists them to a relational DB
 and sends them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:

   1.  No indexing at all - ~45 sec to post 2000 documents
   2.  Indexing included - commit after each add. ~8 minutes (!) to post
 and index 2000 documents
   3.  Indexing included - commitWithin 1ms ~55 seconds (!) to post and
 index 2000 documents
 The 3rd result does not make any sense, I would expect the behavior to be
 similar to the one in point 2. At first I thought that the documents were
 not really committed but I could actually see them being added by executing
 some queries during the experiment (via the solr web UI).
 I am worried that I am missing something very big. The code I use for
 point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc);
 solrConnection.commit();
 Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection
 solrConnection.add(doc, 1); // According to API documentation I understand
 there is no need to explicitly call commit with this API
 Is it possible that committing after each add will degrade performance by
 a factor of 40?




Newb - Search not returning any results

2014-02-12 Thread leevduhl
I setup a Solr Core and populated it with documents but I am not able to get
any results when attempting to search the documents.

A generic search (q=*.*) returns all documents (and fields/values within
those documents), however when I try to search using specific criteria I get
no results back.

I have the following setup in my Schema.xml:
field name=id type=string indexed=true stored=true required=true
multiValued=false / 
field name=mailingcity type=string indexed=true stored=true
multiValued=false/

I run the following query against my Solr instance:
http://{domain}8983/solr/MIM/select?q=*rows=2fl=id+mailingcitywt=jsonindent=truedebugQuery=true

I get the following results back:
{
  responseHeader:{
status:0,
QTime:0,
params:{
  debugQuery:true,
  fl:id, mailingcity,
  indent:true,
  q:*,
  wt:json,
  rows:2}},
  response:{numFound:18024,start:0,docs:[
  {
id:214530123,
mailingcity:redford},
  {
id:204686608,
mailingcity:detroit}]
  },
  debug:{
rawquerystring:*,
querystring:*,
parsedquery:MatchAllDocsQuery(*:*),
parsedquery_toString:*:*,
explain:{
  214530123:\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n  1.0 =
queryNorm\n,
  204686608:\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n  1.0 =
queryNorm\n},
QParser:LuceneQParser,
  ...
}

However, when I try to search specifically where mailingcity=redford I
don't get any results back.  See the following query/results.

Query:
http://{domain}:8983/solr/MIM/select?q=mailingcity=redfordrows=2fl=id,mailingcitywt=jsonindent=truedebugQuery=true

Results:
{
  responseHeader: {
status: 0,
QTime: 1,
params: {
  debugQuery: true,
  fl: id,mailingcity,
  indent: true,
  q: mailingcity=redford,
  _: 1392218691700,
  wt: json,
  rows: 2
}
  },
  response: {
numFound: 0,
start: 0,
docs: []
  },
  debug: {
rawquerystring: mailingcity=redford,
querystring: mailingcity=redford,
parsedquery: text:mailingcity text:redford,
parsedquery_toString: text:mailingcity text:redford,
explain: {},
QParser: LuceneQParser,
  ...
}

If anyone can provide some info on why this is happening and how to solve it
it would be appreciated.

Thank You
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newb-Search-not-returning-any-results-tp4116905.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy
I absolutely agree and I even read the NRT page before posting this question.

The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms delay,- 
I expect that this behavior will be equivalent to making a commit- from a 
functional perspective.

Seeing that there is no magic in the world, I am trying to understand what is 
the price I am actually paying when using the commitWithin feature, on the one 
hand it commits almost immediately, on the other hand, it performs wonderfully. 
Where is the catch?


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be true. I am 
afraid I am missing something

Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not a 
standard commit.

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com wrote:

 I am running a very simple performance experiment where I post 2000 documents 
 to my application. Who in turn persists them to a relational DB and sends 
 them to Solr for indexing (Synchronously, in the same request).
 I am testing 3 use cases:
 
  1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing 
 included - commit after each add. ~8 minutes (!) to post and index 
 2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds 
 (!) to post and index 2000 documents The 3rd result does not make any sense, 
 I would expect the behavior to be similar to the one in point 2. At first I 
 thought that the documents were not really committed but I could actually see 
 them being added by executing some queries during the experiment (via the 
 solr web UI).
 I am worried that I am missing something very big. The code I use for point 2:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection solrConnection.add(doc); 
 solrConnection.commit(); Whereas the code for point 3:
 SolrInputDocument = // get doc
 SolrServer solrConnection = // get connection solrConnection.add(doc, 
 1); // According to API documentation I understand there is no need to 
 explicitly call commit with this API Is it possible that committing after 
 each add will degrade performance by a factor of 40?
 



Re: Importing database DIH

2014-02-12 Thread Gora Mohanty
On 12 February 2014 20:53, Maheedhar Kolla maheedhar.ko...@gmail.com wrote:

 Hi ,


 I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

  I see the following error when I try to do a full-import from my
 local MySQL table ( http:/s/solr//dataimport?command=full-import
 ).

 snip
 ..
 str name=Total Requests made to DataSource0/str
 str name=Total Rows Fetched0/str
 str name=Total Documents Processed0/str
 str name=Total Documents Skipped0/str
 str name=Indexing failed. Rolled back all changes./str
 
 /snip

 I did search to find ways to solve this problem and did create the
 file dataimport.properties , but no success.
[...]

You do not have to create dataimport.properties. Look in the Tomcat logs
for more details on the error, and post the relevant sections here if you
cannot make sense of it. My guess would be that your database credentials
are incorrect, or that the SELECT is failing. Try logging into mysql from
an admin. tool with those credentials, and running the SELECT manually.

Regards,
Gora


Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng
 I was just trying to use SolrJ Client to import XML data to Solr server. And
I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary
format 

I realized there is a XML parser in Solr (We can use a dataUpadateHandler in
Solr default UI Solr Core Dataimport)

So I was wondering how to directly use solr xml parser to upload xml by
using SolrJ Java Code? I could use other open-source xml parser, But I
really want to know if there is a way to call Solr parser library.

Would you mind send me a simple code if possible, really appreciated.
Thanks in advance.

solr/4.6.1
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901.html
Sent from the Solr - User mailing list archive at Nabble.com.


Importing database DIH

2014-02-12 Thread Maheedhar Kolla
Hi ,


I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

 I see the following error when I try to do a full-import from my
local MySQL table ( http:/s/solr//dataimport?command=full-import
).

snip
..
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Indexing failed. Rolled back all changes./str

/snip

I did search to find ways to solve this problem and did create the
file dataimport.properties , but no success.

Any help would be appreciated.


cheers,
Kolla

PS:  When I check the admin panel for statistics for the query
/dataimport , I see the following:

Status : IDLE
Documents Processed : 0
Requests made to DataSource : 0
Rows Fetched : 0
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 0
Total Requests made to DataSource : 0
Total Rows Fetched : 0
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1391612468278
requests : 5
errors : 0
timeouts : 0
totalTime : 28
avgTimePerRequest : 5.6


Also, Here is my dataconfig file.
dataConfig
 dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/DBNAME
user=USER password=PWD/
 document
 entity name=docs query=select * from TABLENAME
field column=id name=id/
field column=content name=text/
field column=title name=title/
 /entity
 /document





-- 
Cheers,
Kolla


RE: Importing database DIH

2014-02-12 Thread Pisarev, Vitaliy
It can be anything from wrong credentials, to missing driver in the class path, 
to malformed connection string, etc..

What does the Solr log say? 

-Original Message-
From: Maheedhar Kolla [mailto:maheedhar.ko...@gmail.com] 
Sent: יום ד 12 פברואר 2014 17:23
To: solr-user@lucene.apache.org
Subject: Importing database DIH

Hi ,


I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

 I see the following error when I try to do a full-import from my local MySQL 
table ( http:/s/solr//dataimport?command=full-import
).

snip
..
str name=Total Requests made to DataSource0/str str name=Total Rows 
Fetched0/str str name=Total Documents Processed0/str str name=Total 
Documents Skipped0/str str name=Indexing failed. Rolled back all 
changes./str 
/snip

I did search to find ways to solve this problem and did create the file 
dataimport.properties , but no success.

Any help would be appreciated.


cheers,
Kolla

PS:  When I check the admin panel for statistics for the query /dataimport , I 
see the following:

Status : IDLE
Documents Processed : 0
Requests made to DataSource : 0
Rows Fetched : 0
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 0
Total Requests made to DataSource : 0
Total Rows Fetched : 0
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1391612468278
requests : 5
errors : 0
timeouts : 0
totalTime : 28
avgTimePerRequest : 5.6


Also, Here is my dataconfig file.
dataConfig
 dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/DBNAME
user=USER password=PWD/  document
 entity name=docs query=select * from TABLENAME
field column=id name=id/
field column=content name=text/
field column=title name=title/
 /entity
 /document





--
Cheers,
Kolla


Re: Newb - Search not returning any results

2014-02-12 Thread Gora Mohanty
On 12 February 2014 20:57, leevduhl ld...@corp.realcomp.com wrote:
[...]
 However, when I try to search specifically where mailingcity=redford I
 don't get any results back.  See the following query/results.

 Query:
 http://{domain}:8983/solr/MIM/select?q=mailingcity=redfordrows=2fl=id,mailingcitywt=jsonindent=truedebugQuery=true

Please start by reading https://wiki.apache.org/solr/SolrQuerySyntax
The argument to 'q' above should be mailingcity:redford. The debug
section in the results even tells you that, as the parsedquery becomes
text:mailingcity text:redford which means that it is searching the
default full-text search field for the strings mailingcity and/or redford.

Regards,
Gora


Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Dmitry Kan
Cross-posting my answer from SO:

According to this wiki:

https://wiki.apache.org/solr/NearRealtimeSearch

the commitWithin is a soft-commit by default. Soft-commits are very
efficient in terms of making the added documents immediately searchable.
But! They are not on the disk yet. That means the documents are being
committed into RAM. In this setup you would use updateLog to be solr
instance crash tolerant.

What you do in point 2 is hard-commit, i.e. flush the added documents to
disk. Doing this after each document add is very expensive. So instead,
post a bunch of documents and issue a hard commit or even have you
autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
your user expectations).



On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy vitaliy.pisa...@hp.comwrote:

 I absolutely agree and I even read the NRT page before posting this
 question.

 The thing that baffles me is this:

 Doing a commit after each add kills the performance.
 On the other hand, when I use commit within and specify an (absurd) 1ms
 delay,- I expect that this behavior will be equivalent to making a commit-
 from a functional perspective.

 Seeing that there is no magic in the world, I am trying to understand what
 is the price I am actually paying when using the commitWithin feature, on
 the one hand it commits almost immediately, on the other hand, it performs
 wonderfully. Where is the catch?


 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: יום ד 12 פברואר 2014 17:00
 To: solr-user
 Subject: Re: Solr perfromance with commitWithin seesm too good to be true.
 I am afraid I am missing something

 Doing a standard commit after every document is a Solr anti-pattern.

 commitWithin is a “near-realtime” commit in recent versions of Solr and
 not a standard commit.

 https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

 - Mark

 http://about.me/markrmiller

 On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
 wrote:

  I am running a very simple performance experiment where I post 2000
 documents to my application. Who in turn persists them to a relational DB
 and sends them to Solr for indexing (Synchronously, in the same request).
  I am testing 3 use cases:
 
   1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
  included - commit after each add. ~8 minutes (!) to post and index
  2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
  (!) to post and index 2000 documents The 3rd result does not make any
 sense, I would expect the behavior to be similar to the one in point 2. At
 first I thought that the documents were not really committed but I could
 actually see them being added by executing some queries during the
 experiment (via the solr web UI).
  I am worried that I am missing something very big. The code I use for
 point 2:
  SolrInputDocument = // get doc
  SolrServer solrConnection = // get connection solrConnection.add(doc);
  solrConnection.commit(); Whereas the code for point 3:
  SolrInputDocument = // get doc
  SolrServer solrConnection = // get connection solrConnection.add(doc,
  1); // According to API documentation I understand there is no need to
  explicitly call commit with this API Is it possible that committing
 after each add will degrade performance by a factor of 40?
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Indexing strategies?

2014-02-12 Thread Erick Erickson
I'd seriously consider a SolrJ program that pulled the necessary data from
two of your systems, held it in cache and then pulled the data from your
main system and enriched it with the cached data.

Or export your information from your remote systems and import them into
a single system where you could do joins.

I believe DIH has some caching ability too that you might consider.

Your basic problem is an inefficient data model where you have to query
these different systems on a row-by-row system, that's where I'd concentrate
my energies..

Best,
Erick


On Wed, Feb 12, 2014 at 2:09 AM, manju16832003 manju16832...@gmail.comwrote:

 Hi,
 I'm facing a dilemma of choosing the indexing strategies.
 My application architecture is
  - I have a listing table in my DB
  - For each listing, I have 3 calls to a URL Datasource of different system

  I have 200k records

  Time taken to index 25 docs is 1Minute, so for 200k it might take more
 than
 100hrs :-(?


  I know there are lot of factors to consider from Network to DB.
 I'm looking for different strategies that we could perform index.

  - Can we run multiple data import handlers? one data-config for first 100k
 and second one is for another 100k
  - Would it be possible to write java service using SolrJ and perform
 multi-threaded calls to Solr to Index?
  - The URL Datasources i'm using is actually resided in MSSQL database of
 different system. Could I be able to fasten indexing time if I just could
 use JDBCDataSource that calls DB directly instead through API URL data
 source?

 Is there any other strategies we could use?

 Thank you,




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Phonetic search on multiple fields

2014-02-12 Thread Erick Erickson
First, why are you talking about DoubleMetaphone when
your fieldType uses BeiderMorseFilterFactory? Which points
up a basic issue you need to wrap your head around or you'll
be endlessly confused. At least I was...

Your analysis chains _must_ do compatible things at index and
query time. The fieldType you're using for phonetic searching does
not since it doesn't use the BeiderMorseFilterFactory at query time.

So the actual values in your index are whatever the Beider... factory
produces but the terms searched are NOT transformed by that factory.

Say you index the term Erick. Your index may have (and I
don't remember what the actual output of Beider is) something
totally transformed like MNUA. But your query does NOT do the
transformation, so the query is looking for Erick. Obviously it
isn't found.

I _strongly_ advise you to take some time to get familiar with the
admin/analysis page, that'll shed light on a _lot_ of analysis issues.

Best,
Erick



On Wed, Feb 12, 2014 at 4:26 AM, Navaa 
navnath.thomb...@xtremumsolutions.com wrote:

 Hi,
 I am beginner of solr,
 I am trying to implement phonetic search in my application
 my code in schema.xml for fieldType

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=0
 generateWordParts=1 stemEnglishPossessive=0
 generateNumberParts=0
 catenateWords=1 catenateNumbers=0 catenateAll=0
 preserveOriginal=1/
 /analyzer
 analyzer type=query

 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 And 

 fieldType name=text_general_phonetic class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=0
 generateWordParts=1 stemEnglishPossessive=0
 generateNumberParts=0
 catenateWords=1 catenateNumbers=0 catenateAll=0
 preserveOriginal=1/
 filter class=solr.BeiderMorseFilterFactory nameType=GENERIC
 ruleType=APPROX concat=true languageSet=auto/

 /analyzer
 analyzer type=query

 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType



 AND field definition

 field name=fname type=text_general indexed=true stored=true
 required=false multiValued=false/
 field name=fname_copy type=text_general_phonetic indexed=true
 stored=true required=false /
 copyfield source=fname dest=fname_copy/


 when I am search stephen, stifn will gives me stephen but it wont works...
 Also if how can I use phonetic filter with DoubleMetaphone encoder..
 Please help me
 Thanks in Advance.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Phonetic-search-on-multiple-fields-tp4116876.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Importing database DIH

2014-02-12 Thread Maheedhar Kolla
Thanks for the comments/advice. I did mess with the drivers ( by
deliberately moving the libs) and it did fail as it is supposed to.

When I looked into catalina.out, I realized that the problem lies with
data directory being owned by root instead of tomcat6.  I changed it
so that tomcat6 can write to data directory and now I see a different
error ( after processing 100+ rows) .  But, am happy that the initial
problem is gone. I should be able to fix these minor ones.

Thanks for the advice again.

Cheers,
Kolla

PS: I think I was looking into the wrong logs before :/



On Wed, Feb 12, 2014 at 10:31 AM, Pisarev, Vitaliy
vitaliy.pisa...@hp.com wrote:
 It can be anything from wrong credentials, to missing driver in the class 
 path, to malformed connection string, etc..

 What does the Solr log say?

 -Original Message-
 From: Maheedhar Kolla [mailto:maheedhar.ko...@gmail.com]
 Sent: יום ד 12 פברואר 2014 17:23
 To: solr-user@lucene.apache.org
 Subject: Importing database DIH

 Hi ,


 I need help with importing data, through DIH.  ( using solr-3.6.1, tomcat6 )

  I see the following error when I try to do a full-import from my local MySQL 
 table ( http:/s/solr//dataimport?command=full-import
 ).

 snip
 ..
 str name=Total Requests made to DataSource0/str str name=Total Rows 
 Fetched0/str str name=Total Documents Processed0/str str 
 name=Total Documents Skipped0/str str name=Indexing failed. Rolled 
 back all changes./str 
 /snip

 I did search to find ways to solve this problem and did create the file 
 dataimport.properties , but no success.

 Any help would be appreciated.


 cheers,
 Kolla

 PS:  When I check the admin panel for statistics for the query /dataimport , 
 I see the following:

 Status : IDLE
 Documents Processed : 0
 Requests made to DataSource : 0
 Rows Fetched : 0
 Documents Deleted : 0
 Documents Skipped : 0
 Total Documents Processed : 0
 Total Requests made to DataSource : 0
 Total Rows Fetched : 0
 Total Documents Deleted : 0
 Total Documents Skipped : 0
 handlerStart : 1391612468278
 requests : 5
 errors : 0
 timeouts : 0
 totalTime : 28
 avgTimePerRequest : 5.6


 Also, Here is my dataconfig file.
 dataConfig
  dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost:3306/DBNAME
 user=USER password=PWD/  document
  entity name=docs query=select * from TABLENAME
 field column=id name=id/
 field column=content name=text/
 field column=title name=title/
  /entity
  /document





 --
 Cheers,
 Kolla



-- 
Cheers,
Kolla


Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Erick Erickson
Hmmm, before going there let's be sure you're trying to do
what you think you are.

Solr does _not_ index arbitrary XML. There is a very
specific format of XML that describes solr documents
that _can_ be indexed. But random XML is not
supported. See the documents in example/exampledocs
for the XML form of Solr docs.

So if you have arbitrary XML, you need to parse it and then
construct Solr documents. One way would be to use
SolrJ, parse the docs using your favorite Java parser and
construct SolrInputDocuments which you then use one of
the SolrServer classes (e.g. CloudSolrServer) to add to the index.

There really is no Solr MXL Parser that I know of, Solr just
uses one of the standard XML parsers (e.g. sax)...

Best,
Erick


On Wed, Feb 12, 2014 at 7:21 AM, Eric_Peng sagittariuse...@gmail.comwrote:

  I was just trying to use SolrJ Client to import XML data to Solr server.
 And
 I read SolrJ wiki that says SolrJ lets you upload content in XML and
 Binary
 format

 I realized there is a XML parser in Solr (We can use a dataUpadateHandler
 in
 Solr default UI Solr Core Dataimport)

 So I was wondering how to directly use solr xml parser to upload xml by
 using SolrJ Java Code? I could use other open-source xml parser, But I
 really want to know if there is a way to call Solr parser library.

 Would you mind send me a simple code if possible, really appreciated.
 Thanks in advance.

 solr/4.6.1




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Shawn Heisey
On 2/12/2014 8:21 AM, Eric_Peng wrote:
  I was just trying to use SolrJ Client to import XML data to Solr server. And
 I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary
 format 
 
 I realized there is a XML parser in Solr (We can use a dataUpadateHandler in
 Solr default UI Solr Core Dataimport)
 
 So I was wondering how to directly use solr xml parser to upload xml by
 using SolrJ Java Code? I could use other open-source xml parser, But I
 really want to know if there is a way to call Solr parser library.
 
 Would you mind send me a simple code if possible, really appreciated.
 Thanks in advance.
 
 solr/4.6.1

When the docs say that SolrJ lets you upload data in XML and binary
format, what they actually mean is that SolrJ will create an update
request that is formatted using XML, not that it will let you send
arbitrary XML data.  It is referring to the specific XML format shown here:

http://wiki.apache.org/solr/UpdateXmlMessages#add.2Freplace_documents

As for an XML parser ... SolrJ's XMLResponseParser is a class that
accepts XML *responses* from Solr and translates them into the Java
response object.  There is also BinaryResponseParser.

The only things that I am aware of in Solr that will deal with XML as
the data source are the XPathEntityProcessor in the dataimport handler
and the ExtractingRequestHandler which uses Apache Tika.  Both of these
are actually contrib modules -- jar files for these features are in the
download, but not built into Solr or SolrJ.

If you are using the extracting request handler, you could probably use
the DirectXmlRequest object, where 'xml' is a String with the xml in it:

  DirectXmlRequest req = new DirectXmlRequest( /update/extract, xml );
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.set(someParam, someValue);
  req.setParams(params);
  NamedListObject response = solrServer.request(req);

I hope that you are right and there actually is an XML parser built into
SolrJ.  We would both learn something.

Thanks,
Shawn



Re: Newb - Search not returning any results

2014-02-12 Thread leevduhl
Thanks the syntax correction solved the problem.  I actually thought I tried
that before I posted.

Thanks
Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newb-Search-not-returning-any-results-tp4116905p4116930.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Erick Erickson
Here's some additional background that may shed light on the
performance..

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Cross-posting my answer from SO:

 According to this wiki:

 https://wiki.apache.org/solr/NearRealtimeSearch

 the commitWithin is a soft-commit by default. Soft-commits are very
 efficient in terms of making the added documents immediately searchable.
 But! They are not on the disk yet. That means the documents are being
 committed into RAM. In this setup you would use updateLog to be solr
 instance crash tolerant.

 What you do in point 2 is hard-commit, i.e. flush the added documents to
 disk. Doing this after each document add is very expensive. So instead,
 post a bunch of documents and issue a hard commit or even have you
 autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
 your user expectations).



 On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
 wrote:

  I absolutely agree and I even read the NRT page before posting this
  question.
 
  The thing that baffles me is this:
 
  Doing a commit after each add kills the performance.
  On the other hand, when I use commit within and specify an (absurd) 1ms
  delay,- I expect that this behavior will be equivalent to making a
 commit-
  from a functional perspective.
 
  Seeing that there is no magic in the world, I am trying to understand
 what
  is the price I am actually paying when using the commitWithin feature, on
  the one hand it commits almost immediately, on the other hand, it
 performs
  wonderfully. Where is the catch?
 
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: יום ד 12 פברואר 2014 17:00
  To: solr-user
  Subject: Re: Solr perfromance with commitWithin seesm too good to be
 true.
  I am afraid I am missing something
 
  Doing a standard commit after every document is a Solr anti-pattern.
 
  commitWithin is a “near-realtime” commit in recent versions of Solr and
  not a standard commit.
 
 
 https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
 
  - Mark
 
  http://about.me/markrmiller
 
  On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com
  wrote:
 
   I am running a very simple performance experiment where I post 2000
  documents to my application. Who in turn persists them to a relational DB
  and sends them to Solr for indexing (Synchronously, in the same request).
   I am testing 3 use cases:
  
1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
   included - commit after each add. ~8 minutes (!) to post and index
   2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
   (!) to post and index 2000 documents The 3rd result does not make any
  sense, I would expect the behavior to be similar to the one in point 2.
 At
  first I thought that the documents were not really committed but I could
  actually see them being added by executing some queries during the
  experiment (via the solr web UI).
   I am worried that I am missing something very big. The code I use for
  point 2:
   SolrInputDocument = // get doc
   SolrServer solrConnection = // get connection solrConnection.add(doc);
   solrConnection.commit(); Whereas the code for point 3:
   SolrInputDocument = // get doc
   SolrServer solrConnection = // get connection solrConnection.add(doc,
   1); // According to API documentation I understand there is no need to
   explicitly call commit with this API Is it possible that committing
  after each add will degrade performance by a factor of 40?
  
 
 


 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan



Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Jack Krupansky
The explicit commit will cause your app to be delayed until that commit 
completes, and then Solr would be idle until that request completion makes 
its way back to your app and you submit another request which finds its way 
to Solr, maybe a few ms. That includes network latency. That interval of 
time could well be more than enough for the short-interval autoCommit or 
commitWithin to run in the background and in parallel with the request 
return to your app and the submission by your app of the subsequent request.


The magic of asynchronous operation in a parallel and distributed computing 
environment, coupled with multi-core processors and parallel threads.


-- Jack Krupansky

-Original Message- 
From: Pisarev, Vitaliy

Sent: Wednesday, February 12, 2014 10:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr perfromance with commitWithin seesm too good to be true. I 
am afraid I am missing something


I absolutely agree and I even read the NRT page before posting this 
question.


The thing that baffles me is this:

Doing a commit after each add kills the performance.
On the other hand, when I use commit within and specify an (absurd) 1ms 
delay,- I expect that this behavior will be equivalent to making a commit- 
from a functional perspective.


Seeing that there is no magic in the world, I am trying to understand what 
is the price I am actually paying when using the commitWithin feature, on 
the one hand it commits almost immediately, on the other hand, it performs 
wonderfully. Where is the catch?



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: יום ד 12 פברואר 2014 17:00
To: solr-user
Subject: Re: Solr perfromance with commitWithin seesm too good to be true. I 
am afraid I am missing something


Doing a standard commit after every document is a Solr anti-pattern.

commitWithin is a “near-realtime” commit in recent versions of Solr and not 
a standard commit.


https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

- Mark

http://about.me/markrmiller

On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy vitaliy.pisa...@hp.com 
wrote:


I am running a very simple performance experiment where I post 2000 
documents to my application. Who in turn persists them to a relational DB 
and sends them to Solr for indexing (Synchronously, in the same request).

I am testing 3 use cases:

 1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
included - commit after each add. ~8 minutes (!) to post and index
2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
(!) to post and index 2000 documents The 3rd result does not make any 
sense, I would expect the behavior to be similar to the one in point 2. At 
first I thought that the documents were not really committed but I could 
actually see them being added by executing some queries during the 
experiment (via the solr web UI).
I am worried that I am missing something very big. The code I use for 
point 2:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc);
solrConnection.commit(); Whereas the code for point 3:
SolrInputDocument = // get doc
SolrServer solrConnection = // get connection solrConnection.add(doc,
1); // According to API documentation I understand there is no need to
explicitly call commit with this API Is it possible that committing after 
each add will degrade performance by a factor of 40?




Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng
Thanks a lot, learnt a lot from it



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng
Thanks you so much Erick, I will try to write my owe XML parser




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using numeric ranges in Solr query

2014-02-12 Thread Rafał Kuć
Hello!

Just specify the left boundary, like: price:[900 TO 1000]

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


 When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
 items with price around 1000 USD. I found in documentation that i can use
 price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
 But i want to get results where price is between 900 to 1000 USD. 

 Any help is appreciated.

 Thank You.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Jack Krupansky
There is also an XSLT update handler option to transform raw XML to Solr 
XML on the fly. If anybody here has used it, feel free to chime in.


See:
http://wiki.apache.org/solr/XsltUpdateRequestHandler
and
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-UsingXSLTtoTransformXMLIndexUpdates

-- Jack Krupansky

-Original Message- 
From: Eric_Peng

Sent: Wednesday, February 12, 2014 11:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about how to upload XML by using SolrJ Client Java 
Code


Thanks you so much Erick, I will try to write my owe XML parser




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116936.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Using numeric ranges in Solr query

2014-02-12 Thread jay67
When user enter a price in price field, for Ex: 1000 USD, i want to fetch all
items with price around 1000 USD. I found in documentation that i can use
price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
But i want to get results where price is between 900 to 1000 USD. 

Any help is appreciated.

Thank You.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using numeric ranges in Solr query

2014-02-12 Thread Jack Krupansky

Is price a float/double field?

price:[99.5 TO 100.5]  -- price near 100

price:[900 TO 1000]

or

price:[899.5 TO 1000.5]

-- Jack Krupansky

-Original Message- 
From: jay67

Sent: Wednesday, February 12, 2014 12:03 PM
To: solr-user@lucene.apache.org
Subject: Using numeric ranges in Solr query

When user enter a price in price field, for Ex: 1000 USD, i want to fetch 
all

items with price around 1000 USD. I found in documentation that i can use
price:[* to 1000] like that. It will get all items with from 1 to 1000 USD.
But i want to get results where price is between 900 to 1000 USD.

Any help is appreciated.

Thank You.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-numeric-ranges-in-Solr-query-tp4116941.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Set up embedded Solr container and cores programmatically to read their configs from the classpath

2014-02-12 Thread Alan Woodward
Hi Robert,

I don't think this is possible at the moment, but I hope to get 
https://issues.apache.org/jira/browse/SOLR-4478 in for Lucene/Solr 4.7, which 
should allow you to inject your own SolrResourceLoader implementation for core 
creation (it sounds as though you want to wrap the core's loader in a 
ClasspathResourceLoader).  You could try applying that patch to your setup and 
see if that helps you out.

Alan Woodward
www.flax.co.uk


On 11 Feb 2014, at 10:41, Robert Krüger wrote:

 Hi,
 
 I have an application with an embedded Solr instance (and I want to
 keep it embedded) and so far I have been setting up my Solr
 installation programmatically using folder paths to specify where the
 specific container or core configs are.
 
 I have used the CoreContainer methods createAndLoad and create using
 File arguments and this works fine. However, now I want to change this
 so that all configuration files are loaded from certain locations
 using the classloader but I have not been able to get this to work.
 
 E.g. I want to have my solr config located in the classpath at
 
 my/base/package/solr/conf
 
 and the core configs at
 
 my/base/package/solr/cores/core1/conf,
 my/base/package/solr/cores/core2/conf
 
 etc..
 
 Is this possible at all? Looking through the source code it seems that
 specifying classpath resources in such a qualified way is not
 supported but I may be wrong.
 
 I could get this to work for the container by supplying my own
 implementation of SolrResourceLoader that allows a base path to be
 specified for the resources to be loaded (I first thought that would
 happen already when specifying instanceDir accordingly but looking at
 the code it does not. for resources loaded through the classloader,
 instanceDir is not prepended). However then I am stuck with the
 loading of the cores' resources as the respective code (see
 org.apache.solr.core.CoreContainer#createFromLocal) instantiates a
 SolResourceLoader internally.
 
 Thanks for any help with this (be it a clarification that it is not possible).
 
 Robert



RE: Solr4 performance

2014-02-12 Thread Joshi, Shital
Does Solr4 load entire index in Memory mapped file? What is the eviction policy 
of this memory mapped file? Can we control it?

_
From: Joshi, Shital [Tech]
Sent: Wednesday, February 05, 2014 12:00 PM
To: 'solr-user@lucene.apache.org'
Subject: Solr4 performance


Hi,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes 
(cloud). We're using local disk (/local/data) to store solr index files. All 
hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far 
we have 470 million documents. We are using custom sharding and all shards have 
~9-10 million documents. We have a GUI sending queries to this cloud and GUI 
has 30 seconds of timeout.

Lately we're getting many timeouts on GUI and upon checking we found that all 
timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 96% 
of physical memory but the other host looks perfectly good. Both hosts are for 
different shards. Would increasing ram of these two hosts make these timeouts 
go away? What else we can check?

Many Thanks!




Re: Solr4 performance

2014-02-12 Thread Shalin Shekhar Mangar
No, Solr doesn't load the entire index in memory. I think you'll find
Uwe's blog most helpful on this matter:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

On Thu, Feb 13, 2014 at 12:27 AM, Joshi, Shital shital.jo...@gs.com wrote:
 Does Solr4 load entire index in Memory mapped file? What is the eviction 
 policy of this memory mapped file? Can we control it?

 _
 From: Joshi, Shital [Tech]
 Sent: Wednesday, February 05, 2014 12:00 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr4 performance


 Hi,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
 boxes (cloud). We're using local disk (/local/data) to store solr index 
 files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap 
 size. So far we have 470 million documents. We are using custom sharding and 
 all shards have ~9-10 million documents. We have a GUI sending queries to 
 this cloud and GUI has 30 seconds of timeout.

 Lately we're getting many timeouts on GUI and upon checking we found that all 
 timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 
 96% of physical memory but the other host looks perfectly good. Both hosts 
 are for different shards. Would increasing ram of these two hosts make these 
 timeouts go away? What else we can check?

 Many Thanks!





-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr4 performance

2014-02-12 Thread Greg Walters
Shital,

Take a look at 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's 
a pretty decent explanation of memory mapped files. I don't believe that the 
default configuration for solr is to use MMapDirectory but even if it does my 
understanding is that the entire file won't be forcibly cached by solr. The 
OS's filesystem cache should control what's actually in ram and the eviction 
process will depend on the OS.

Thanks,
Greg

On Feb 12, 2014, at 12:57 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Does Solr4 load entire index in Memory mapped file? What is the eviction 
 policy of this memory mapped file? Can we control it?
 
 _
 From: Joshi, Shital [Tech]
 Sent: Wednesday, February 05, 2014 12:00 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr4 performance
 
 
 Hi,
 
 We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
 boxes (cloud). We're using local disk (/local/data) to store solr index 
 files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap 
 size. So far we have 470 million documents. We are using custom sharding and 
 all shards have ~9-10 million documents. We have a GUI sending queries to 
 this cloud and GUI has 30 seconds of timeout.
 
 Lately we're getting many timeouts on GUI and upon checking we found that all 
 timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 
 96% of physical memory but the other host looks perfectly good. Both hosts 
 are for different shards. Would increasing ram of these two hosts make these 
 timeouts go away? What else we can check?
 
 Many Thanks!
 
 



RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)
Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

private void addSpatialField(double lat, double lon, SolrInputDocument 
document) {
   StringBuilder sb = new StringBuilder();
   sb.append(lat).append(,).append(lon);
   document.addField(location, sb.toString());
}

Using Solr 4.3.1 and spatial4j 0.3, I am getting the following error in the 
solr logs:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)

spatial4j 0.3 is looking for something like POINT but Solr is converting my 
lat,long to Pt(x=-72.544123,y=41.85).

Version mismatch?

Thanks in advance for your help!

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Monday, January 13, 2014 11:30 AM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

Hello Jim,

By the way, using GeohashPrefixTree.getMaxLevelsPossible() is usually an 
extreme choice.  Instead you probably want to choose only as many levels needed 
for your distance tolerance.  See SpatialPrefixTreeFactory which you can use 
outright or borrow the code it uses.

Looking at your code, I see you are coding against Solr directly in Java 
instead of where most people do this as an HTTP web service.  But it's unclear 
in what context your code runs because you are getting into the guts of things 
that you normally don't have to do, even if your using SolrJ or writing some 
sort of UpdateRequestProcessor.  Simply configure the field type in schema.xml 
appropriately, and then to index a point simply give Solr a string for the 
field in latitude, longitude format.  I don't know why you are using 
field.tokenStream(analyzer) for the field value - that is clearly wrong and the 
cause of the error.  I think your confusion more has to do with differences in 
coding to Lucene versus Solr;  this being an actual spatial concern.  You 
referenced SpatialDemoUpdateProcessorFactory so I see you have looked at 
SolrSpatialSandbox on GitHub.  That particular URP should get some warnings 
added to it in the code to suggest that you probably should do what it does.  
If you look at the solrconfig.xml that configures it, there is a warning as 
follows:
  !-- spatial
   Only needed for an SpatialDemoUpdateProcessorFactory which copies  spatial 
objects from one field
to other spatial fields in object form to avoid redundant/inefficient 
string to spatial object de-serialization.
   --
Even if you have a similar circumstance, you're code doesn't quite look like 
this URP.  You shouldn't need to reference the SpatialStrategy, for example.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Friday, January 10, 2014 at 12:15 PM
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Cc: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Indexing spatial fields into SolrCloud (HTTP)

I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()), 
DocumentFieldNames.LOCATION);


   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  document.add(field);
   }

   //Store the field
   document.add(new StoredField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point)));

Solr version:

   Point point = geoSpatialCtx.makePoint(lon, lat);
   for (IndexableField field : 
geoSpatialStrategy.createIndexableFields(point)) {
  try {
 solrDocument.addField(field.name(), 
field.tokenStream(analyzer));
  } catch (IOException e) {
 LOGGER.error(Failed to add geo field to Solr index, e);
  }
   }

   // Store the field
   solrDocument.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));

The server-side error is as follows:

Caused by: com.spatial4j.core.exception.InvalidShapeException: Unable to read: 

filtering/faceting by a big list IDs

2014-02-12 Thread Tri Cao
Hi all,I am running a Solr application and I would need to implement a feature that requires faceting and filtering on a large list of IDs. The IDs are stored outside of Solr and is specific to the current logged on user. An example of this is the articles/tweets the user has read in the last few weeks. Note that the IDs here are the real document IDs and not Lucene internal docids.So the question is what would be the best way to implement this in Solr? The list could be as large as a ten of thousands of IDs. The obvious way of rewriting Solr query to add the ID list as "facet.query" and "fq" doesn't seem to be the best way because: a) the query would be very long, and b) it would surely exceed thatthe default limit of 1024 Boolean clauses and I am sure the limit is there for a reason.I had a similar problem before but back then I was using Lucene directly and the way I solved it is to use a MultiTermQuery to retrieve the internal docids from the ID list and then apply the resulting DocSet to counting and filtering. It was working reasonably for list of size ~10K, and with proper caching, it was working ok. My current application is very invested in Solr that going back to Lucene is not an option anymore.All advice/suggestion are welcomed.Thanks,Tri

Re: Solr4 performance

2014-02-12 Thread Shawn Heisey

On 2/12/2014 12:07 PM, Greg Walters wrote:

Take a look at 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's 
a pretty decent explanation of memory mapped files. I don't believe that the 
default configuration for solr is to use MMapDirectory but even if it does my 
understanding is that the entire file won't be forcibly cached by solr. The 
OS's filesystem cache should control what's actually in ram and the eviction 
process will depend on the OS.


I only have a little bit to add.  Here's the first thing that Uwe's blog 
post (linked above) says:


Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby 
default on 64bit Windows and Solaris systems; since version 3.3 also for 
64bit Linux systems.


The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory 
by default under the hood.


A summary about all this that should be relevant to the original question:

It's the *operating system* that handles memory mapping, including any 
caching that happens.  Assuming that you don't have a badly configured 
virtual machine setup, I'm fairly sure that only real memory gets used, 
never swap space on the disk.  If something else on the system makes a 
memory allocation, the operating system will instantly give up memory 
used for caching and mapping.  One of the strengths of mmap is that it 
can't exceed available resources unless it's used incorrectly.


Thanks,
Shawn



Re: filtering/faceting by a big list IDs

2014-02-12 Thread Joel Bernstein
Tri,

You will most likely need to implement a custom QParserPlugin to
efficiently handle what you described. Inside of this QParserPlugin you
could create the logic that would bring in your outside list of ID's and
build a DocSet that could be applied to the fq and the facet.query. I
haven't attempted to use a QParserPlugin with a facet.query, but in theory
it would work.

With the filter query you also have the option of implementing your Query
as a PostFilter. PostFilter logic is applied at collect time so the logic
needs to only be applied to the documents that match the query. In many
cause this can be faster, especially when result sets are relatively small
but the index is large.


Joel Bernstein
Search Engineer at Heliosearch


On Wed, Feb 12, 2014 at 2:12 PM, Tri Cao tm...@me.com wrote:

 Hi all,

 I am running a Solr application and I would need to implement a feature
 that requires faceting and filtering on a large list of IDs. The IDs are
 stored outside of Solr and is specific to the current logged on user. An
 example of this is the articles/tweets the user has read in the last few
 weeks. Note that the IDs here are the real document IDs and not Lucene
 internal docids.

 So the question is what would be the best way to implement this in Solr?
 The list could be as large as a ten of thousands of IDs. The obvious way of
 rewriting Solr query to add the ID list as facet.query and fq doesn't
 seem to be the best way because: a) the query would be very long, and b) it
 would surely exceed that the default limit of 1024 Boolean clauses and I
 am sure the limit is there for a reason.

 I had a similar problem before but back then I was using Lucene directly
 and the way I solved it is to use a MultiTermQuery to retrieve the internal
 docids from the ID list and then apply the resulting DocSet to counting and
 filtering. It was working reasonably for list of size ~10K, and with proper
 caching, it was working ok. My current application is very invested in Solr
 that going back to Lucene is not an option anymore.

 All advice/suggestion are welcomed.

 Thanks,
 Tri




Re: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Smiley, David W.
That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get abbreviated.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 2:05 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

privatevoid addSpatialField(double lat, double lon, SolrInputDocument document) 
{
   StringBuilder sb = new StringBuilder();
   sb.append(lat).append(,).append(lon);
   document.addField(location, sb.toString());
}

Using Solr 4.3.1 and spatial4j 0.3, I am getting the following error in the 
solr logs:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)

spatial4j 0.3 is looking for something like POINT but Solr is converting my 
“lat,long” to Pt(x=-72.544123,y=41.85).

Version mismatch?

Thanks in advance for your help!

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Monday, January 13, 2014 11:30 AM
To: Beale, Jim (US-KOP); 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

Hello Jim,

By the way, using GeohashPrefixTree.getMaxLevelsPossible() is usually an 
extreme choice.  Instead you probably want to choose only as many levels needed 
for your distance tolerance.  See SpatialPrefixTreeFactory which you can use 
outright or borrow the code it uses.

Looking at your code, I see you are coding against Solr directly in Java 
instead of where most people do this as an HTTP web service.  But it’s unclear 
in what context your code runs because you are getting into the guts of things 
that you normally don’t have to do, even if your using SolrJ or writing some 
sort of UpdateRequestProcessor.  Simply configure the field type in schema.xml 
appropriately, and then to index a point simply give Solr a string for the 
field in “latitude, longitude” format.  I don’t know why you are using 
field.tokenStream(analyzer) for the field value — that is clearly wrong and the 
cause of the error.  I think your confusion more has to do with differences in 
coding to Lucene versus Solr;  this being an actual spatial concern.  You 
referenced “SpatialDemoUpdateProcessorFactory” so I see you have looked at 
SolrSpatialSandbox on GitHub.  That particular URP should get some warnings 
added to it in the code to suggest that you probably should do what it does.  
If you look at the solrconfig.xml that configures it, there is a warning as 
follows:
  !-- spatial
   Only needed for an SpatialDemoUpdateProcessorFactory which copies  spatial 
objects from one field
to other spatial fields in object form to avoid redundant/inefficient 
string to spatial object de-serialization.
   --
Even if you have a similar circumstance, you’re code doesn’t quite look like 
this URP.  You shouldn’t need to reference the SpatialStrategy, for example.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Friday, January 10, 2014 at 12:15 PM
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Cc: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org
Subject: Indexing spatial fields into SolrCloud (HTTP)

I am porting an application from Lucene to Solr which makes use of spatial4j 
for distance searches.  The Lucene version works correctly but I am having a 
problem getting the Solr version to work in the same way.

Lucene version:

SpatialContext geoSpatialCtx = SpatialContext.GEO;

   geoSpatialStrategy = new RecursivePrefixTreeStrategy(new 
GeohashPrefixTree(
 geoSpatialCtx, GeohashPrefixTree.getMaxLevelsPossible()), 

Re: Solr4 performance

2014-02-12 Thread Roman Chyla
And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.

Best,

  roman


On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/12/2014 12:07 PM, Greg Walters wrote:

 Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
 on-64bit.html as it's a pretty decent explanation of memory mapped
 files. I don't believe that the default configuration for solr is to use
 MMapDirectory but even if it does my understanding is that the entire file
 won't be forcibly cached by solr. The OS's filesystem cache should control
 what's actually in ram and the eviction process will depend on the OS.


 I only have a little bit to add.  Here's the first thing that Uwe's blog
 post (linked above) says:

 Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default
 on 64bit Windows and Solaris systems; since version 3.3 also for 64bit
 Linux systems.

 The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory
 by default under the hood.

 A summary about all this that should be relevant to the original question:

 It's the *operating system* that handles memory mapping, including any
 caching that happens.  Assuming that you don't have a badly configured
 virtual machine setup, I'm fairly sure that only real memory gets used,
 never swap space on the disk.  If something else on the system makes a
 memory allocation, the operating system will instantly give up memory used
 for caching and mapping.  One of the strengths of mmap is that it can't
 exceed available resources unless it's used incorrectly.

 Thanks,
 Shawn




Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa,

you need query expansion for that.
E.g. if your query goes through dismax, you need to add the two field names to 
the qf parameter.
The nice thing is that qf can be:
  text^3.0 test.stemmed^2 text.phonetic^1
And thus exact matches are preferred to stemmed or phonetic matches.

This is configured in solrconfig.xml.
It's quite common to create your own query component to do more than just 
dismax for this.

hope it helps.

paul


Le 12 févr. 2014 à 14:22, Navaa navnath.thomb...@xtremumsolutions.com a écrit 
:

 Hi,
 I am using solr for searching phoneticly equivalent string
 my schema contains...
 fieldType name=text_general_doubleMetaphone class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=DoubleMetaphone
 inject=true/
   filter class=solr.LowerCaseFilterFactory/ 
   /analyzer
   analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=DoubleMetaphone
 inject=true/
   filter class=solr.LowerCaseFilterFactory/ 
   /analyzer
 /fieldType
 
 and fields are
 
 field name=fname type=text_general indexed=true stored=true
 required=false /
 field name=fname_sound type=text_general_doubleMetaphone indexed=true
 stored=true required=false /
   copyField source=fname dest=fname_sound /
 
 
 it works when i search stfn=== stephen, stephn 
 
 But I am expecting stephn= stephen like 
 How I will get this result. m I doing something wrong
 
 Thanks in advance
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Searching-phonetic-by-DoubleMetaphone-soundex-encoder-tp4116885.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimize Index in solr 4.6

2014-02-12 Thread Shawn Heisey

On 2/6/2014 4:00 AM, Shawn Heisey wrote:

I would not recommend it, but if you know for sure that your
infrastructure can handle it, then you should be able to optimize them
all at once by sending parallel optimize requests with distrib=false
directly to the Solr cores that hold the shard replicas, not the collection.


Followup on this thread:

Evidence now suggests (thank you, Yago!) that sending an optimize 
request with distrib=false might *NOT* optimize just the core that 
receives the request.  I can confirm that this is the case on a 
SolrCloud 4.2.1 setup with one shard and replicationFactor=2.  It 
optimized that core, then when that was finished, optimized the other 
replica.


I would have already filed an issue in Jira, except that I do not 
currently have any way to test this on 4.6.1, so I do not know if this 
is still the way it works.  Also, I do not have a distributed SolrCloud 
index available.  I will be looking into writing a unit test, but my 
grasp of SolrCloud tests is very weak.


Thanks,
Shawn



RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)
Hi David,

You wrote:

 Perhaps you’ve got some funky UpdateRequestProcessor from experimentation 
 you’ve done that’s parsing then toString’ing it?

No, nothing at all.  The update processing is straight out-of-the-box Solr.

 And also, your stack trace should have more to it than what you present here.

I trimmed the stack trace because it seemed like TMI, but here it is for 
completeness’ sake:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)
at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:186)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:257)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
…..

 Your SolrJ code looks totally fine.

Finally, I changed the code to the following,

private void addSpatialLcnFields(double lat, double lon, SolrInputDocument 
document) {
   Point point = geoSpatialCtx.makePoint(lon, lat);
   document.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));
}

and at least it isn’t throwing exceptions now.  I’m not sure what is going into 
the index yet.  I’ll have to wait for it to finish.  (

Does that code seem correct?  I want to avoid the deprecated API but so far I 
haven’t found any alternatives which work.

Thanks,

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, February 12, 2014 3:07 PM
To: Beale, Jim (US-KOP); solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get abbreviated.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 2:05 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

I finally got back to this again, after getting sidetracked for a couple of 
weeks.

I implemented things in accordance with my understanding of what you wrote 
below.  Using SolrJ, the code to index the spatial field is as follows,

privatevoid addSpatialField(double lat, double lon, 

Re: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Smiley, David W.
Your new code should also work, and should be equivalent.

The longer stack trace you have is of the wrapping SolrException which wraps 
another exception — InvalidShapeException.  You should also see the stack trace 
of InvalidShapeException which should originate out of Spatial4j.

~ David

From: Beale, Jim (US-KOP) jim.be...@hibu.commailto:jim.be...@hibu.com
Date: Wednesday, February 12, 2014 at 5:21 PM
To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org, 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: RE: Indexing spatial fields into SolrCloud (HTTP)

Hi David,

You wrote:

 Perhaps you’ve got some funky UpdateRequestProcessor from experimentation 
 you’ve done that’s parsing then toString’ing it?

No, nothing at all.  The update processing is straight out-of-the-box Solr.

 And also, your stack trace should have more to it than what you present here.

I trimmed the stack trace because it seemed like TMI, but here it is for 
completeness’ sake:

1518436 [qtp1529118084-24] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: 
com.spatial4j.core.exception.InvalidShapeException: Unable to read: 
Pt(x=-72.544123,y=41.85)
at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:144)
at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:118)
at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:186)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:257)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:199)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
…..

 Your SolrJ code looks totally fine.

Finally, I changed the code to the following,

privatevoid addSpatialLcnFields(double lat, double lon, SolrInputDocument 
document) {
   Point point = geoSpatialCtx.makePoint(lon, lat);
   document.addField(geoSpatialStrategy.getFieldName(), 
geoSpatialCtx.toString(point));
}

and at least it isn’t throwing exceptions now.  I’m not sure what is going into 
the index yet.  I’ll have to wait for it to finish.  (

Does that code seem correct?  I want to avoid the deprecated API but so far I 
haven’t found any alternatives which work.

Thanks,

Jim Beale

From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, February 12, 2014 3:07 PM
To: Beale, Jim (US-KOP); 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: Re: Indexing spatial fields into SolrCloud (HTTP)

That’s pretty weird.  It appears that somehow a Spatial4j Point class is having 
it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) 
and then Spatial4j is trying to parse this which isn’t in a valid format — the 
toString is more debug-ability.  Your SolrJ code looks totally fine.  Perhaps 
you’ve got some funky UpdateRequestProcessor from experimentation you’ve done 
that’s parsing then toString’ing it?  And also, your stack trace should have 
more to it than what you present here.  It may be on the server-side versus the 
HTTP error page which can get 

Weird issue with q.op=AND

2014-02-12 Thread Shamik Bandopadhyay
Hi,

  I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.

lst name=appends
   str name=fqSource:TestHelp/str
/lst

Now, the moment I add an additional parameter, search stops returning any
result.

lst name=appends
   str name=fqSource:TestHelp | Source:TestHelp2/str
/lst

If I remove q.op=AND from request handler, I get results back. Data is
present for both the Source I'm using, so it's not a filtering issue. Even
a blank query fails to return data.

Here's my request handler.

requestHandler name=/testhandler class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.contentTypetext/html;charset=UTF-8/str
str name=v.layoutlayout/str
str name=v.channeltesthandler/str
str name=defTypeedismax/str
str name=q.opAND/str
str name=q.alt*:*/str
str name=rows15/str
str name=flid,url,Source2,text/str
str name=qftext^1.5 title^2/str
str name=bqSource:TestHelp^3 Source:TestHelp2^0.85/str
str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str
str name=dftext/str

!-- facets --
str name=faceton/str
str name=facet.mincount1/str
str name=facet.limit100/str
str name=facet.fieldlanguage/str
str name=facet.fieldSource/str
 !-- Highlighting defaults --
str name=hltrue/str
str name=hl.fltext title/str
str name=f.text.hl.fragsize250/str
str name=f.text.hl.alternateFieldShortDesc/str

!-- Spell check settings --
str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str

   !-- Shard Tolerant --
str name=shards.toleranttrue/str
/lst
lst name=appends
str name=fqSource:TestHelp | Source2:TestHelp2/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

Not sure what's going wrong. I'm using a SolrCloud environment with 2
shards having a replica each.

Any pointers will be appreciated.

Thanks,
Shamik


Re: Weird issue with q.op=AND

2014-02-12 Thread Shawn Heisey

On 2/12/2014 3:32 PM, Shamik Bandopadhyay wrote:

Hi,

   I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.

lst name=appends
str name=fqSource:TestHelp/str
/lst

Now, the moment I add an additional parameter, search stops returning any
result.

lst name=appends
str name=fqSource:TestHelp | Source:TestHelp2/str
/lst

If I remove q.op=AND from request handler, I get results back. Data is
present for both the Source I'm using, so it's not a filtering issue. Even
a blank query fails to return data.


I'm pretty sure that's not valid Solr query syntax for what you're 
trying to do.  Try this instead, although with these specific examples I 
would leave the quotes out of the fq value:


lst name=appends
   str name=fqSource:(TestHelp OR TestHelp2)/str
/lst

It's pretty much accidental (a result of the query analysis chain) that 
it was working when you didn't have q.op=AND.  You can verify what I'm 
saying by looking at the parsed query when adding debugQuery=true as a 
query option.


Thanks,
Shawn



Re: Weird issue with q.op=AND

2014-02-12 Thread shamik
Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).


lst name=appends 
 str name=fqSource:TestHelp | Source:downloads |
-AccessMode:internal | -workflowparentid:[* TO *]/str 
/lst

This has been working as expected and applies the filter correctly. Just
curious, if its an invalid syntax, how's Solr handling this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117022.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Weird issue with q.op=AND

2014-02-12 Thread Shawn Heisey

On 2/12/2014 4:58 PM, shamik wrote:

Thanks a lot Shawn. Changing the appends filtering based on your suggestion
worked. The part which confused me bigtime is the syntax I've been using so
far without an issue (barring the q.op part).


lst name=appends
  str name=fqSource:TestHelp | Source:downloads |
-AccessMode:internal | -workflowparentid:[* TO *]/str
/lst

This has been working as expected and applies the filter correctly. Just
curious, if its an invalid syntax, how's Solr handling this ?


Honestly, I can't really say what's going on here.  After I got this, I 
tried some example queries like that and they do seem to be parsed 
right.  You could try adding turning on debugQuery for the query that 
doesn't work and see if you can see what the problem is.  I had never 
seen a query syntax with | in it before.  The other syntax is a little 
more explicit, though.


Thanks,
Shawn



change character correspondence in icu lib

2014-02-12 Thread alxsss
Hello,

I use
icu4j-49.1.jar,
lucene-analyzers-icu-4.6-SNAPSHOT.jar

for one of the fields in the form 

filter class=solr.ICUFoldingFilterFactory /

I need to change one of the accent char's corresponding letter. I made changes 
to this file

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.

Any ideas where the appropriate change must be made?

Thanks.
Alex.





Re: change character correspondence in icu lib

2014-02-12 Thread Alexandre Rafalovitch
Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  alx...@aim.com wrote:
 Hello,

 I use
 icu4j-49.1.jar,
 lucene-analyzers-icu-4.6-SNAPSHOT.jar

 for one of the fields in the form

 filter class=solr.ICUFoldingFilterFactory /

 I need to change one of the accent char's corresponding letter. I made 
 changes to this file

 lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

 recompiled solr and lucene and replaced the above jars with new ones, but no 
 change in the indexing and parsing of keywords.

 Any ideas where the appropriate change must be made?

 Thanks.
 Alex.





Re: Weird issue with q.op=AND

2014-02-12 Thread shamik
Thanks, I'll take a look at the debug data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing strategies?

2014-02-12 Thread manju16832003
Hi Erick,
Thank you very much, those are valuable suggestions :-).
I would give a try.
Appreciate your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852p4117050.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Weird issue with q.op=AND

2014-02-12 Thread Jack Krupansky
Did you mean to use || for the OR operator? A single | is not treated as 
an operator - it will be treated as a term and sent through normal term 
analysis.


-- Jack Krupansky

-Original Message- 
From: Shamik Bandopadhyay

Sent: Wednesday, February 12, 2014 5:32 PM
To: solr-user@lucene.apache.org
Subject: Weird issue with q.op=AND

Hi,

 I'm facing a weird problem while using q.op=AND condition. Looks like it
gets into some conflict if I use multiple appends condition in
conjunction. It works as long as I've one filtering condition in appends.

lst name=appends
  str name=fqSource:TestHelp/str
/lst

Now, the moment I add an additional parameter, search stops returning any
result.

lst name=appends
  str name=fqSource:TestHelp | Source:TestHelp2/str
/lst

If I remove q.op=AND from request handler, I get results back. Data is
present for both the Source I'm using, so it's not a filtering issue. Even
a blank query fails to return data.

Here's my request handler.

requestHandler name=/testhandler class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=wtvelocity/str
str name=v.templatebrowse/str
str name=v.contentTypetext/html;charset=UTF-8/str
str name=v.layoutlayout/str
str name=v.channeltesthandler/str
str name=defTypeedismax/str
str name=q.opAND/str
str name=q.alt*:*/str
str name=rows15/str
str name=flid,url,Source2,text/str
str name=qftext^1.5 title^2/str
str name=bqSource:TestHelp^3 Source:TestHelp2^0.85/str
str name=bfrecip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0/str
str name=dftext/str

!-- facets --
str name=faceton/str
str name=facet.mincount1/str
str name=facet.limit100/str
str name=facet.fieldlanguage/str
str name=facet.fieldSource/str
!-- Highlighting defaults --
str name=hltrue/str
str name=hl.fltext title/str
str name=f.text.hl.fragsize250/str
str name=f.text.hl.alternateFieldShortDesc/str

!-- Spell check settings --
str name=spellchecktrue/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str

  !-- Shard Tolerant --
   str name=shards.toleranttrue/str
/lst
lst name=appends
str name=fqSource:TestHelp | Source2:TestHelp2/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

Not sure what's going wrong. I'm using a SolrCloud environment with 2
shards having a replica each.

Any pointers will be appreciated.

Thanks,
Shamik 



Limit amount of search result

2014-02-12 Thread rachun
Dear all gurus,


I would like to limit amount of search result, let's say I have many shop
which is selling shirt. So when I search white shirt I want to give a
maximum number per shop (ex. 5).

The result should be like this...
- Shop A
- Shop A
- Shop B
- Shop B
- Shop B
- Shop B
- Shop B
- Shop C
- Shop C
- Shop C

any suggestion would be very appreciate.

Thank you very much,
Chun.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Limit amount of search result

2014-02-12 Thread Sameer Maggon
Chun,

Have you looked at Grouping / Field Collapsing feature in solr?

https://wiki.apache.org/solr/FieldCollapsing

If shop is one of your field, you can use field collapsing on that field
with a maximum of 'n' to return per field value (or group).

Sameer.
--
www.measuredsearch.com
tw: measuredsearch

On Wednesday, February 12, 2014, rachun rachun.c...@gmail.com wrote:

 Dear all gurus,


 I would like to limit amount of search result, let's say I have many shop
 which is selling shirt. So when I search white shirt I want to give a
 maximum number per shop (ex. 5).

 The result should be like this...
 - Shop A
 - Shop A
 - Shop B
 - Shop B
 - Shop B
 - Shop B
 - Shop B
 - Shop C
 - Shop C
 - Shop C

 any suggestion would be very appreciate.

 Thank you very much,
 Chun.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Sameer Maggon
Founder, Measured Search
m: 310.344.7266
tw: @measuredsearch
w: http://www.measuredsearch.com


Re: Join Scoring

2014-02-12 Thread anand chandak

Re-posting...



Thanks,

Anand


On 2/12/2014 10:55 AM, anand chandak wrote:

Thanks David, really helpful response.

You mentioned that if we have to add scoring support in solr then a 
possible approach would be to add a custom QueryParser, which might be 
taking Lucene's JOIN module.  I have tired this approach and this 
makes it slow, because I believe this is making more searches..


Curious, if it is possible instead to enhance existing solr's 
JoinQParserPlugin and add the the scoring support in the same class ? 
Do you think its feasible and recommended ? If yes, what would it take 
(highlevel) - in terms of code changes, any pointers ?


Thanks,

Anand


On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:

Hi Anand.

Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster 
and
more memory efficient (particularly the worse-case memory use) to 
implement
the JOIN query without scoring, so that's why.  Of course, you might 
want it

to score and pay whatever penalty is involved.  For that you'll need to
write a Solr QueryParser that might use Lucene's join module 
which has
scoring variants.  I've taken this approach before.  You asked a 
specific

question about the purpose of JoinScorer when it doesn't actually score.
Lucene's Query produces a Weight which in turn produces a 
Scorer that
is a DocIdSetIterator plus it returns a score.  So Queries have to 
have a

Scorer to match any document even if the score is always 1.

Solr does indeed have a lot of caching; that may be in play here when
comparing against a quick attempt at using Lucene directly.  In 
particular,

the matching documents are likely to end up in Solr's DocumentCache.
Returning stored fields that come back in search results are one of 
the more

expensive things Lucene/Solr does.

I also think you noted that the fields on documents from the from 
side of
the query are not available to be returned in search results, just 
the to

side.  Yup; that's true.  To remedy this, you might write a Solr
SearchComponent that adds fields from the from side.  That could be 
tricky
to do; it would probably need to re-run the from-side query but 
filtered to

the matching top-N documents being returned.

~ David


anand chandak wrote

Resending, if somebody can please respond.


Thanks,

Anand


On 2/5/2014 6:26 PM, anand chandak wrote:
Hi,

Having a question on join score, why doesn't the solr join query return
the scores. Looking at the code, I see there's JoinScorer defined in
the  JoinQParserPlugin class ? If its not used for scoring ? where 
is it

actually used.

Also, to evaluate the performance of solr join plugin vs lucene
joinutil, I filed same join query against same data-set and same schema
and in the results, I am always seeing the Qtime for Solr much lower
then lucenes. What is the reason behind this ?  Solr doesn't return
scores could that cause so much difference ?

My guess is solr has very sophisticated caching mechanism and that 
might

be coming in play, is that true ? or there's difference in the way JOIN
happens in the 2 approach.

If I understand correctly both the implementation are using 2 pass
approach - first all the terms from fromField and then returns all
documents that have matching terms in a toField

If somebody can throw some light, would highly appreciate.

Thanks,
Anand




-
  Author: 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html

Sent from the Solr - User mailing list archive at Nabble.com.







Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa,
You need the query to be sent to the two fields. In dismax, this is easy.
Paul

On 12 février 2014 14:22:33 HNEC, Navaa navnath.thomb...@xtremumsolutions.com 
wrote:
Hi,
I am using solr for searching phoneticly equivalent string
my schema contains...
fieldType name=text_general_doubleMetaphone class=solr.TextField
positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=DoubleMetaphone
inject=true/
   filter class=solr.LowerCaseFilterFactory/ 
   /analyzer
   analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=DoubleMetaphone
inject=true/
   filter class=solr.LowerCaseFilterFactory/ 
   /analyzer
 /fieldType

and fields are

field name=fname type=text_general indexed=true stored=true
required=false /
field name=fname_sound type=text_general_doubleMetaphone
indexed=true
stored=true required=false /
   copyField source=fname dest=fname_sound /


it works when i search stfn=== stephen, stephn 

But I am expecting stephn= stephen like 
How I will get this result. m I doing something wrong

Thanks in advance



--
View this message in context:
http://lucene.472066.n3.nabble.com/Searching-phonetic-by-DoubleMetaphone-soundex-encoder-tp4116885.html
Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.

APACHE SOLR: Pass a file as query parameter and then parse each line to form a criteria

2014-02-12 Thread rajeev.nadgauda
Hi ,
I am new to solr , i need help with the following

PROBLEM: I have a huge file of 1 lines i want this to be an inclusion or
exclusion in the query . i.e each line like ( line1 or line2 or ..)

How can this be achieved in solr , is there a custom implementation that i
would need to implement. Also will it help to implement a custom filter ?


Thank You,
Rajeev Nadgauda.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/APACHE-SOLR-Pass-a-file-as-query-parameter-and-then-parse-each-line-to-form-a-criteria-tp4117066.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr delta indexing approach

2014-02-12 Thread lalitjangra
Hi,I am working on a prototyope where i have a content source  i am indexing
all documents  strore the index in solr.Now i have pre-condition that my
content source is ever changing means there is always new content added to
it. As i have read that solr use to do indexing on full source only
everytime solr is asked for indexing.But this may lead to underutilization
of reosources as same documents are getting reindexed again  again.Is there
any approach to handle such scenarios. E.g. I have 1  documents in my
source which have been indexed by solr till today. But next day my source
has 11000 documents. So i want to i ndex only new 1000 documents not all
11000. Can anybody suggest for this?Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr delta indexing approach

2014-02-12 Thread Alexandre Rafalovitch
You have read that Solr needs to reindex a full source. That's correct
(unless you use atomic updates). But - the important point is - this
is per document. So, once you indexed your 1 documents, you don't
need to worry about them until they change.

Just go ahead and index your additional documents only. I am assuming
your source system can figure out what the new ones are (timestamp,
etc).

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 12:15 PM, lalitjangra lalit.j.jan...@gmail.com wrote:
 Hi,I am working on a prototyope where i have a content source  i am indexing
 all documents  strore the index in solr.Now i have pre-condition that my
 content source is ever changing means there is always new content added to
 it. As i have read that solr use to do indexing on full source only
 everytime solr is asked for indexing.But this may lead to underutilization
 of reosources as same documents are getting reindexed again  again.Is there
 any approach to handle such scenarios. E.g. I have 1  documents in my
 source which have been indexed by solr till today. But next day my source
 has 11000 documents. So i want to i ndex only new 1000 documents not all
 11000. Can anybody suggest for this?Thanks in advance.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Navaa
hi,
Thanks for your reply..

I m beginner of solr kindly elaborate it mor details because in my
solrconfig.xml

requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows5/int
   str name=dfname/str
  
 /lst
  /requestHandler
  
  requestHandler name=standard class=solr.StandardRequestHandler
default=true /
  requestHandler name=/update class=solr.XmlUpdateRequestHandler /
  requestHandler name=/admin/
class=org.apache.solr.handler.admin.AdminHandlers /
  
  requestHandler name=/analysis/field
class=solr.FieldAnalysisRequestHandler/
  
  requestHandler name=/get class=solr.RealTimeGetHandler
lst name=defaults
str name=omitHeadertrue/str
str name=wtjson/str
str name=indenttrue/str
/lst
  /requestHandler

where I can add this qf parameter for those two fields...
hope you will understand the scenario..
Thanks in advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-phonetic-by-DoubleMetaphone-soundex-encoder-tp4116885p4117073.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr delta indexing approach

2014-02-12 Thread lalitjangra
Thanks Alex,

Yes my source system maintains the crettion  last modificaiton system of
each document.

As per your inputs, can i assume that next time when solr starts indexing,
it  scans all the prsent in source but only picks those for indexing which
are either new or have been updated since last successful indexing.

How solr does this or in short what is solr strategy for indexing? I would
definitely like to know more about it  if you can share your thoughts on
same, it would be great.

Regards,
Lalit.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068p4117077.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr delta indexing approach

2014-02-12 Thread Sadler, Anthony
I had this problem when I started to look at Solr as an index for a file 
server. What I ended up doing was writing a perl script that did this:

- Scan the whole filesystem and create an XML that is submitted into Solr for 
indexing. As this might be some 600,000 files, I break it down into chunks of N 
files (N = 200 currently).
- At the end of a successful scan, write out the time it started to a file.
- Next time you run the script, the script looks for the start time file. It 
reads that in and checks every file in the system:
= If it has a mod_time greater than the begin_time, it has changed since we 
last updated it, so reindex it.
= If it doesn't, just update the last_seen timestamp in Solr (a field we 
created) so we know its still there.

We're doing that and its indexing just fine.

-Original Message-
From: lalitjangra [mailto:lalit.j.jan...@gmail.com]
Sent: Thursday, 13 February 2014 4:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr delta indexing approach

Thanks Alex,

Yes my source system maintains the crettion  last modificaiton system of each 
document.

As per your inputs, can i assume that next time when solr starts indexing, it  
scans all the prsent in source but only picks those for indexing which are 
either new or have been updated since last successful indexing.

How solr does this or in short what is solr strategy for indexing? I would 
definitely like to know more about it  if you can share your thoughts on same, 
it would be great.

Regards,
Lalit.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068p4117077.html
Sent from the Solr - User mailing list archive at Nabble.com.




==
Privileged/Confidential Information may be contained in this message. If you 
are not the addressee indicated in this message (or responsible for delivery of 
the message to such person), you may not copy or deliver this message to 
anyone. In such case, you should destroy this message and kindly notify the 
sender by reply email. Please advise immediately if you or your employer does 
not consent to email for messages of this kind. Opinions, conclusions and other 
information in this message that do not relate to the official business of 
Burson-Marsteller shall be understood as neither given nor endorsed by it.
==



Re: Solr delta indexing approach

2014-02-12 Thread Alexandre Rafalovitch
I'd start from doing Solr tutorial. It will explain a lot of things.
But in summary, you can send data to Solr (best option) or you can
pull it using DataImportHandler. Take your pick, do the tutorial,
maybe read some books. Then come back with specific questions of where
you started.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 12:45 PM, lalitjangra lalit.j.jan...@gmail.com wrote:
 Thanks Alex,

 Yes my source system maintains the crettion  last modificaiton system of
 each document.

 As per your inputs, can i assume that next time when solr starts indexing,
 it  scans all the prsent in source but only picks those for indexing which
 are either new or have been updated since last successful indexing.

 How solr does this or in short what is solr strategy for indexing? I would
 definitely like to know more about it  if you can share your thoughts on
 same, it would be great.

 Regards,
 Lalit.







 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068p4117077.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr delta indexing approach

2014-02-12 Thread Walter Underwood
Why write a Perl script for that?

touch new_timestamp
find . -newer timestamp | script-to-submit  mv new_timestamp timestamp

Neither approach deals with deleted files.

To do this correctly, you need lists of all the files in the index with their 
timestamps, and of all the files in the repository. Then you need to difference 
them to find deleted ones, new ones, and ones that have changed. You might even 
want to track links and symlinks to get dupes and canonical paths.

wunder

On Feb 12, 2014, at 10:00 PM, Sadler, Anthony anthony.sad...@yrgrp.com 
wrote:

 I had this problem when I started to look at Solr as an index for a file 
 server. What I ended up doing was writing a perl script that did this:
 
 - Scan the whole filesystem and create an XML that is submitted into Solr for 
 indexing. As this might be some 600,000 files, I break it down into chunks of 
 N files (N = 200 currently).
 - At the end of a successful scan, write out the time it started to a file.
 - Next time you run the script, the script looks for the start time file. It 
 reads that in and checks every file in the system:
 = If it has a mod_time greater than the begin_time, it has changed since we 
 last updated it, so reindex it.
 = If it doesn't, just update the last_seen timestamp in Solr (a field we 
 created) so we know its still there.
 
 We're doing that and its indexing just fine.
 
 -Original Message-
 From: lalitjangra [mailto:lalit.j.jan...@gmail.com]
 Sent: Thursday, 13 February 2014 4:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr delta indexing approach
 
 Thanks Alex,
 
 Yes my source system maintains the crettion  last modificaiton system of 
 each document.
 
 As per your inputs, can i assume that next time when solr starts indexing, it 
  scans all the prsent in source but only picks those for indexing which are 
 either new or have been updated since last successful indexing.
 
 How solr does this or in short what is solr strategy for indexing? I would 
 definitely like to know more about it  if you can share your thoughts on 
 same, it would be great.
 
 Regards,
 Lalit.
 
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068p4117077.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 ==
 Privileged/Confidential Information may be contained in this message. If you 
 are not the addressee indicated in this message (or responsible for delivery 
 of the message to such person), you may not copy or deliver this message to 
 anyone. In such case, you should destroy this message and kindly notify the 
 sender by reply email. Please advise immediately if you or your employer does 
 not consent to email for messages of this kind. Opinions, conclusions and 
 other information in this message that do not relate to the official business 
 of Burson-Marsteller shall be understood as neither given nor endorsed by it.
 ==
 

--
Walter Underwood
wun...@wunderwood.org





Re: Solr delta indexing approach

2014-02-12 Thread lalitjangra
Thanks all.

I am following couple of articles for same.

I am sending data to solr instead of using DIH and able to successfully
index data in solr.

My concern here is to ensure how to minimize solr indexing so that only
updated data is indexed each time out of all data items.

Is this something OOTB available in solr or we need to do it?

Regards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117068p4117087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Integrating Oauth2 with Solr MailEntityProcessor

2014-02-12 Thread Dileepa Jayakody
Hi again,

Anybody interested in this feature for Solr MailEntityProcessor?
WDYT?

Thanks,
Dileepa


On Thu, Jan 30, 2014 at 11:00 AM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:

 Hi All,

 I think Oauth2 integration is a valid usecase for Solr when it comes to
 importing data from user-accounts like email, social-networks, enterprise
 stores etc.
 Do you think Oauth2 integration in Solr will be an useful feature? If so I
 would like to start working on this.
 I feel this could also be a good project for GSoC 2014.

 Thanks,
 Dileepa


 On Wed, Jan 29, 2014 at 3:57 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:

 Hi All,

 I'm doing a research project on : Email Reputation Analysis and for this
 project I'm planning to use Apache Solr, Tika and Mahout projects to
 analyse, store and query reputation of emails and correspondents.

 For indexing emails in Solr I'm going to use the MailEntityProcessor [1].
 But I see that it requires the user to provide their email credentials to
 the DIH which is a security risk. Also I feel current MailEntityProcessor
 doesn't allow importing data from multiple mail boxes.

 What do you think of integrating an authorization mechanism like OAuth2
 in Solr?
 Appreciate your ideas on using this for indexing multiple mailboxes
 without requiring users to give their username passwords.

 document   entity processor=MailEntityProcessor
 user=someb...@gmail.compassword=something
 host=imap.gmail.comprotocol=imaps   folders = 
 x,y,z//document

 Regards,
 Dileepa

 [1] http://wiki.apache.org/solr/MailEntityProcessor





RE: Solr delta indexing approach

2014-02-12 Thread Sadler, Anthony
At the risk of derailing the thread:

We do a lot more in the script than is mentioned here: We pull out parts of the 
path and mangle them (for example turn them into a UNC path for users to use, 
or pull out a client name or job number using a known folder structure). As for 
deleted files, here's how the script works in totality:

- Script runs first time, finds every file and puts into formerly empty Solr 
DB. For every file found, set date_last_seen = current_time. Writes out 
last_begin_time file and ends. 
- Secondary function of script runs sometime after, looks for any file with a 
date_last_seen  last_begin_time. Nothing is found this time around.
- Script runs next time, see's that there is a last_begin_time file, reads in 
that time. Script then runs in a delta mode, looking for all files modified 
later than last_begin_time. If it finds them, it re-indexs them and their 
contents. All other files that have mod_time less than last_run_time merely 
have their date_last_seen updated. Script ends, writes out last_begin_time file.

At this point, any files that were deleted between the first and second run 
were not updated , so their date_last_seen is different from all the others. 
This gives me something to look for. 

- Secondary function of script runs sometime after, looks for any file with a 
date_last_seen  last_begin_time. This time around, some files are found. These 
files have their isDeleted field in solr set to 1. 

Hopefully that makes a bit more sense. 


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Thursday, 13 February 2014 5:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr delta indexing approach

Why write a Perl script for that?

touch new_timestamp
find . -newer timestamp | script-to-submit  mv new_timestamp timestamp

Neither approach deals with deleted files.

To do this correctly, you need lists of all the files in the index with their 
timestamps, and of all the files in the repository. Then you need to difference 
them to find deleted ones, new ones, and ones that have changed. You might even 
want to track links and symlinks to get dupes and canonical paths.

wunder

On Feb 12, 2014, at 10:00 PM, Sadler, Anthony anthony.sad...@yrgrp.com 
wrote:

 I had this problem when I started to look at Solr as an index for a file 
 server. What I ended up doing was writing a perl script that did this:
 
 - Scan the whole filesystem and create an XML that is submitted into Solr for 
 indexing. As this might be some 600,000 files, I break it down into chunks of 
 N files (N = 200 currently).
 - At the end of a successful scan, write out the time it started to a file.
 - Next time you run the script, the script looks for the start time file. It 
 reads that in and checks every file in the system:
 = If it has a mod_time greater than the begin_time, it has changed since we 
 last updated it, so reindex it.
 = If it doesn't, just update the last_seen timestamp in Solr (a field we 
 created) so we know its still there.
 
 We're doing that and its indexing just fine.
 
 -Original Message-
 From: lalitjangra [mailto:lalit.j.jan...@gmail.com]
 Sent: Thursday, 13 February 2014 4:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr delta indexing approach
 
 Thanks Alex,
 
 Yes my source system maintains the crettion  last modificaiton system of 
 each document.
 
 As per your inputs, can i assume that next time when solr starts indexing, it 
  scans all the prsent in source but only picks those for indexing which are 
 either new or have been updated since last successful indexing.
 
 How solr does this or in short what is solr strategy for indexing? I would 
 definitely like to know more about it  if you can share your thoughts on 
 same, it would be great.
 
 Regards,
 Lalit.
 
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-delta-indexing-approach-tp4117
 068p4117077.html Sent from the Solr - User mailing list archive at 
 Nabble.com.
 
 
 
 
 ==
 Privileged/Confidential Information may be contained in this message. If you 
 are not the addressee indicated in this message (or responsible for delivery 
 of the message to such person), you may not copy or deliver this message to 
 anyone. In such case, you should destroy this message and kindly notify the 
 sender by reply email. Please advise immediately if you or your employer does 
 not consent to email for messages of this kind. Opinions, conclusions and 
 other information in this message that do not relate to the official business 
 of Burson-Marsteller shall be understood as neither given nor endorsed by it.
 ==
 

--
Walter Underwood
wun...@wunderwood.org