date:20140708

Re: Disable all caches in Solr

2014-07-08 Thread vidit.asthana

Thanks Chris. I understand this. But this test is to determine the *maximum*
latency a query can have and hence I have disabled all caches. 

After disabling all caches in solrconfig, I was able to remove latency
variation for a single query in most of the cases. But still *sort* queries
are showing variation in latency when executed multiple times. Is there some
hidden cache for sorting?

When I run query below for first time, it shows higher latency, but when I
run it second time it shows lower QTime.

http://localhost:7000/solr/collection1/select?q=field1:keywordrows=20sort=field2
desc

*If I remove the sorting then I always get fixed QTime*. field2 is of type
tlong.

Any ideas why this is happening and how to prevent this variation?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-all-caches-in-Solr-tp4144933p4146039.html
Sent from the Solr - User mailing list archive at Nabble.com.

Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian

Dears,
Hi,
According to my requirement I need to change the default behavior of Solr
for overwriting the whole document on unique-key duplication. I am going to
change that the overwrite just part of document (some fields) and other
parts of document (other fields) remain unchanged. First of all I need to
know such changing in Solr behavior is possible? Second, I really
appreciate if you can guide me through what class/classes should I consider
for changing that?
Best regards.

-- 
A.Nazemian

Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-08 Thread Harald Kirsch


Hi all,

This is what happens when I run a regular wget query to log the current 
number of documents indexed:


2014-07-08:07:23:28 QTime=20 numFound=5720168
2014-07-08:07:24:28 QTime=12 numFound=5721126
2014-07-08:07:25:28 QTime=19 numFound=5721126
2014-07-08:07:27:18 QTime=50071 numFound=5721126
2014-07-08:07:29:08 QTime=50058 numFound=5724494
2014-07-08:07:30:58 QTime=50033 numFound=5730710
2014-07-08:07:31:58 QTime=13 numFound=5730710
2014-07-08:07:33:48 QTime=50065 numFound=5734069
2014-07-08:07:34:48 QTime=16 numFound=5737742
2014-07-08:07:36:38 QTime=50037 numFound=5737742
2014-07-08:07:37:38 QTime=12 numFound=5738190
2014-07-08:07:38:38 QTime=23 numFound=5741208
2014-07-08:07:40:29 QTime=50034 numFound=5742067
2014-07-08:07:41:29 QTime=12 numFound=5742067
2014-07-08:07:42:29 QTime=17 numFound=5742067
2014-07-08:07:43:29 QTime=20 numFound=5745497
2014-07-08:07:44:29 QTime=13 numFound=5745981
2014-07-08:07:45:29 QTime=23 numFound=5746420

As you can see, the QTime is just over 50 seconds at irregular intervals.

This happens independent of whether I am indexing documents with around 
20 dps or not. First I thought about a dependence on the auto-commit of 
5 minutes, but the the 50 seconds hits are too irregular.


Furthermore, and this is *really strange*: when hooking strace on the 
solr process, the 50 seconds QTimes disappear completely and 
consistently --- a real Heisenbug.


Nevertheless, strace shows that there is a socket timeout of 50 seconds 
defined in calls like this:


[pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 
5) = 1 ([{fd=96, revents=POLLIN}]) 0.40


where the fd=96 is the result of

[pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, 
sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, 
[16]) = 96 0.54


where again fd=122 is the TCP port on which solr was started.

My hunch is that this is communication between the cores of solr.

I tried to search the internet for such a strange connection between 
socket timeouts and strace, but could not find anything (the 
stackoverflow entry from yesterday is my own :-(



This smells a bit like a race condition/deadlock kind of thing which is 
broken up by timing differences introduced by stracing the process.


Any hints appreciated.

For completeness, here is my setup:
- solr-4.8.1,
- cloud version running
- 10 shards on 10 cores in one instance
- hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, 
PATCHLEVEL 2

- hosted on a vmware, 4 CPU cores, 16 GB RAM
- single digit million docs indexed, exact number does not matter
- zero query load


Harald.

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Himanshu Mehrotra

Please look at https://wiki.apache.org/solr/Atomic_Updates

This does what you want just update relevant fields.

Thanks,
Himanshu


On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com wrote:

 Dears,
 Hi,
 According to my requirement I need to change the default behavior of Solr
 for overwriting the whole document on unique-key duplication. I am going to
 change that the overwrite just part of document (some fields) and other
 parts of document (other fields) remain unchanged. First of all I need to
 know such changing in Solr behavior is possible? Second, I really
 appreciate if you can guide me through what class/classes should I consider
 for changing that?
 Best regards.

 --
 A.Nazemian

Re: SOLR on hdfs

2014-07-08 Thread shlash

Hi all,
I am new to Solr and hdfs, actually, I am trying to index text content
extracted from binary files like PDF, MS Office...etc which are stored on
hdfs (single node), till now I've running Solr on HDFS, and create the core
but I couldn't send the files to solr for indexing.
Can someone please help me to do that.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-on-hdfs-tp4045128p4146049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-08 Thread Heyde, Ralf

My First assumption: full gc.
Can you please tell us about your jvm setup and maybe trace what happens
the jvms?
On Jul 8, 2014 9:54 AM, Harald Kirsch harald.kir...@raytion.com wrote:

 Hi all,

 This is what happens when I run a regular wget query to log the current
 number of documents indexed:

 2014-07-08:07:23:28 QTime=20 numFound=5720168
 2014-07-08:07:24:28 QTime=12 numFound=5721126
 2014-07-08:07:25:28 QTime=19 numFound=5721126
 2014-07-08:07:27:18 QTime=50071 numFound=5721126
 2014-07-08:07:29:08 QTime=50058 numFound=5724494
 2014-07-08:07:30:58 QTime=50033 numFound=5730710
 2014-07-08:07:31:58 QTime=13 numFound=5730710
 2014-07-08:07:33:48 QTime=50065 numFound=5734069
 2014-07-08:07:34:48 QTime=16 numFound=5737742
 2014-07-08:07:36:38 QTime=50037 numFound=5737742
 2014-07-08:07:37:38 QTime=12 numFound=5738190
 2014-07-08:07:38:38 QTime=23 numFound=5741208
 2014-07-08:07:40:29 QTime=50034 numFound=5742067
 2014-07-08:07:41:29 QTime=12 numFound=5742067
 2014-07-08:07:42:29 QTime=17 numFound=5742067
 2014-07-08:07:43:29 QTime=20 numFound=5745497
 2014-07-08:07:44:29 QTime=13 numFound=5745981
 2014-07-08:07:45:29 QTime=23 numFound=5746420

 As you can see, the QTime is just over 50 seconds at irregular intervals.

 This happens independent of whether I am indexing documents with around 20
 dps or not. First I thought about a dependence on the auto-commit of 5
 minutes, but the the 50 seconds hits are too irregular.

 Furthermore, and this is *really strange*: when hooking strace on the solr
 process, the 50 seconds QTimes disappear completely and consistently --- a
 real Heisenbug.

 Nevertheless, strace shows that there is a socket timeout of 50 seconds
 defined in calls like this:

 [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40

 where the fd=96 is the result of

 [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
 sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)},
 [16]) = 96 0.54

 where again fd=122 is the TCP port on which solr was started.

 My hunch is that this is communication between the cores of solr.

 I tried to search the internet for such a strange connection between
 socket timeouts and strace, but could not find anything (the stackoverflow
 entry from yesterday is my own :-(


 This smells a bit like a race condition/deadlock kind of thing which is
 broken up by timing differences introduced by stracing the process.

 Any hints appreciated.

 For completeness, here is my setup:
 - solr-4.8.1,
 - cloud version running
 - 10 shards on 10 cores in one instance
 - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
 PATCHLEVEL 2
 - hosted on a vmware, 4 CPU cores, 16 GB RAM
 - single digit million docs indexed, exact number does not matter
 - zero query load


 Harald.

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian

Dear Himanshu,
Hi,
You misunderstood what I meant. I am not going to update some field. I am
going to change what Solr do on duplication of uniquekey field. I dont want
to solr overwrite Whole document I just want to overwrite some parts of
document. This situation does not come from user side this is what solr do
to documents with duplicated uniquekey.
Regards.


On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra 
himanshu.mehro...@snapdeal.com wrote:

 Please look at https://wiki.apache.org/solr/Atomic_Updates

 This does what you want just update relevant fields.

 Thanks,
 Himanshu


 On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dears,
  Hi,
  According to my requirement I need to change the default behavior of Solr
  for overwriting the whole document on unique-key duplication. I am going
 to
  change that the overwrite just part of document (some fields) and other
  parts of document (other fields) remain unchanged. First of all I need to
  know such changing in Solr behavior is possible? Second, I really
  appreciate if you can guide me through what class/classes should I
 consider
  for changing that?
  Best regards.
 
  --
  A.Nazemian
 




-- 
A.Nazemian

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-08 Thread Harald Kirsch


No, no full GC.

The JVM does nothing during the outages, no CPU, no GC, as checked with 
jvisualvm and htop.


Harald.

On 08.07.2014 10:12, Heyde, Ralf wrote:

My First assumption: full gc.
Can you please tell us about your jvm setup and maybe trace what happens
the jvms?
On Jul 8, 2014 9:54 AM, Harald Kirsch harald.kir...@raytion.com wrote:


Hi all,

This is what happens when I run a regular wget query to log the current
number of documents indexed:

2014-07-08:07:23:28 QTime=20 numFound=5720168
2014-07-08:07:24:28 QTime=12 numFound=5721126
2014-07-08:07:25:28 QTime=19 numFound=5721126
2014-07-08:07:27:18 QTime=50071 numFound=5721126
2014-07-08:07:29:08 QTime=50058 numFound=5724494
2014-07-08:07:30:58 QTime=50033 numFound=5730710
2014-07-08:07:31:58 QTime=13 numFound=5730710
2014-07-08:07:33:48 QTime=50065 numFound=5734069
2014-07-08:07:34:48 QTime=16 numFound=5737742
2014-07-08:07:36:38 QTime=50037 numFound=5737742
2014-07-08:07:37:38 QTime=12 numFound=5738190
2014-07-08:07:38:38 QTime=23 numFound=5741208
2014-07-08:07:40:29 QTime=50034 numFound=5742067
2014-07-08:07:41:29 QTime=12 numFound=5742067
2014-07-08:07:42:29 QTime=17 numFound=5742067
2014-07-08:07:43:29 QTime=20 numFound=5745497
2014-07-08:07:44:29 QTime=13 numFound=5745981
2014-07-08:07:45:29 QTime=23 numFound=5746420

As you can see, the QTime is just over 50 seconds at irregular intervals.

This happens independent of whether I am indexing documents with around 20
dps or not. First I thought about a dependence on the auto-commit of 5
minutes, but the the 50 seconds hits are too irregular.

Furthermore, and this is *really strange*: when hooking strace on the solr
process, the 50 seconds QTimes disappear completely and consistently --- a
real Heisenbug.

Nevertheless, strace shows that there is a socket timeout of 50 seconds
defined in calls like this:

[pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
5) = 1 ([{fd=96, revents=POLLIN}]) 0.40

where the fd=96 is the result of

[pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)},
[16]) = 96 0.54

where again fd=122 is the TCP port on which solr was started.

My hunch is that this is communication between the cores of solr.

I tried to search the internet for such a strange connection between
socket timeouts and strace, but could not find anything (the stackoverflow
entry from yesterday is my own :-(


This smells a bit like a race condition/deadlock kind of thing which is
broken up by timing differences introduced by stracing the process.

Any hints appreciated.

For completeness, here is my setup:
- solr-4.8.1,
- cloud version running
- 10 shards on 10 cores in one instance
- hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
PATCHLEVEL 2
- hosted on a vmware, 4 CPU cores, 16 GB RAM
- single digit million docs indexed, exact number does not matter
- zero query load


Harald.





--
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49 211 53883-216
Fax +49-211-550266-19
http://www.raytion.com

Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather

Hi,

Need to optimize index created using CloudSolrServer APIs under SolrCloud
setup of 3 instances on separate machines. Currently it optimizes
sequentially if I invoke cloudSolrServer.optimize().

To make it parallel I tried making three separate HttpSolrServer instances
and invoked httpSolrServer.opimize() on them parallely but still it seems
to be doing optimization sequentially.

I tried invoking optimize directly using HttpPost with following url and
parameters but still it seems to be sequential.
*URL* : http://host:port/solr/collection/update

*Parameters*:
params.add(new BasicNameValuePair(optimize, true));
params.add(new BasicNameValuePair(maxSegments, 1));
params.add(new BasicNameValuePair(waitFlush, true));
params.add(new BasicNameValuePair(distrib, false));

Kindly provide your suggestion and help.

Regards,
Modassar

[Solr Schema API] SolrJ Access

2014-07-08 Thread Alessandro Benedetti

Hi guys,
wondering if there is any proper way to access Schema API via Solrj.

Of course is possible to reach them in Java with a specific Http Request,
but in this way, using SolrCloud for example we become coupled to one
specific instance ( and we don't want) .

Code Example :

HttpResponse httpResponse;
 String url=this.solrBase+/+core+ SCHEMA_SOLR_FIELDS_ENDPOINT
 +fieldName;
 HttpPut httpPut = new HttpPut(url);
 StringEntity entity = new StringEntity(
 {\type\:\text_general\,\stored\:\true\} ,
 ContentType.APPLICATION_JSON);
  httpPut.setEntity( entity );
  HttpClient client=new DefaultHttpClient();
  response = client.execute(httpPut);


Any suggestion ?
In my opinion should be interesting to have some auxiliary method in
SolrServer if it's not there yet.

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Alexandre Rafalovitch

-- Forwarded message --
From: Poornima Jay poornima...@rocketmail.com
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1

When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and define the path in the solrconfig.xml as below

lib 
dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
regex=solr-langid-.*\.jar /

I am getting the below error while reloading the core.

SEVERE: java.lang.NoClassDefFoundError:
com/cybozu/labs/langdetect/DetectorFactory

Please advice.

Thanks,
Poornima

On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:

If you are having troubles with jar location, just use absolute path
in your lib statement and use path, not dir/regex. That will complain
louder. You should be using the latest jar matching the version, they
should be shipped with Solr itself.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
poornima...@rocketmail.com wrote:
 I am facing the issue with the jar file location. Where should i place the
 solr-langid-3.6.1.jar. If i place it in the instance folder inside
 /lib/solr-langid-3.6.1.jar the language detection class are not loaded.
 Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?

 Can you please attach the schema file also for reference.

 lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar /
 lib dir=${user.dir}/../contrib/langid/lib/ /

 where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/

 Thanks for your time.

 Regards,
 Poornima

 On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 I've had an example in my book:
 https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
 , though it was for Solr 4.2+. Solr in Action also has a section on
 multilingual indexing. There is no generic advice, as everybody seems
 to have slightly different multilingual requirements, but the books
 will at least discuss the main issues.

 Regarding your specific email from a week ago, You haven't actually
 said what is the problem was. Just what you did. So, we don't know
 where you are stuck and what - specifically - you need help with.

 Regards,
  Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency

 On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com
 wrote:
 Hi,

 Please let me know if anyone had used google language detection for
 implementing multilanguage search in one schema.

 Thanks,
 Poornima

 On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com
 wrote:

 Hi,

 Can anyone please let me know how to integrate
 http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
 languages (English, chinese simplified, chinese traditional, Japanes, and
 Korean) to be added in one schema ie. multilingual search from single
 schema
 file.

 I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
 location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
 in the solrconfig.xml as below

 directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

  updateRequestProcessorChain name=langid
processor

 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
lst name=invariants
str name=langid.flcontent_eng/str
str name=langid.maptrue/str
str name=langid.map.flcontent_eng,content_ja/str
str name=langid.whitelisten,ja/str
str name=langid.map.lcmapen:english ja:japanese/str
str name=langid.fallbacken/str
/lst
/processor
  /updateRequestProcessorChain

  requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
str name=update.chainlangid/str
/lst
  /requestHandler

 Please suggest me the solution.

 Thanks,
 Poornima

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Poornima Jay

When i use solr-langid-3.5.0.jar file after reloading the core i am getting the 
below error 

SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException


Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.

Thanks,
Poornima



On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch arafa...@gmail.com 
wrote:
 


-- Forwarded message --

From: Poornima Jay poornima...@rocketmail.com
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1


When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and define the path in the solrconfig.xml as below

lib 
dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
regex=solr-langid-.*\.jar /

I am getting the below error while reloading the core.

SEVERE: java.lang.NoClassDefFoundError:
com/cybozu/labs/langdetect/DetectorFactory

Please advice.

Thanks,
Poornima


On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:


If you are having troubles with jar location, just use absolute path
in your lib statement and use path, not dir/regex. That will complain
louder. You should be using the latest jar matching the version, they
should be shipped with Solr itself.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
poornima...@rocketmail.com wrote:
 I am facing the issue with the jar file location. Where should i place the
 solr-langid-3.6.1.jar. If i place it in the instance folder inside
 /lib/solr-langid-3.6.1.jar the language detection class are not loaded.
 Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?

 Can you please attach the schema file also for reference.

 lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar /
 lib dir=${user.dir}/../contrib/langid/lib/ /

 where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/

 Thanks for your time.

 Regards,
 Poornima



 On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:


 I've had an example in my book:
 https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
 , though it was for Solr 4.2+. Solr in Action also has a section on
 multilingual indexing. There is no generic advice, as everybody seems
 to have slightly different multilingual requirements, but the books
 will at least discuss the main issues.

 Regarding your specific email from a week ago, You haven't actually
 said what is the problem was. Just what you did. So, we don't know
 where you are stuck and what - specifically - you need help with.

 Regards,
  Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com
 wrote:
 Hi,

 Please let me know if anyone had used google language detection for
 implementing multilanguage search in one schema.

 Thanks,
 Poornima




 On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com
 wrote:


 Hi,

 Can anyone please let me know how to integrate
 http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
 languages (English, chinese simplified, chinese traditional, Japanes, and
 Korean) to be added in one schema ie. multilingual search from single
 schema
 file.

 I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
 location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
 in the solrconfig.xml as below

 directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

  updateRequestProcessorChain name=langid
    processor

 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
    lst name=invariants
    str name=langid.flcontent_eng/str
    str name=langid.maptrue/str
    str name=langid.map.flcontent_eng,content_ja/str
    str name=langid.whitelisten,ja/str
    str name=langid.map.lcmapen:english ja:japanese/str
    str name=langid.fallbacken/str
    /lst
    /processor
  /updateRequestProcessorChain

  requestHandler name=/update class=solr.UpdateRequestHandler
    lst name=defaults
    str name=update.chainlangid/str
    /lst
  /requestHandler

 Please suggest me the solution.

 Thanks,
 Poornima

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Alexandre Rafalovitch

I just realized you are not using Solr language detect libraries. You
are using third party one. You did mention that in your first message.

I don't see that library integrated with Solr though, just as a
standalone library. So, you can't just plug in it.

Is there any reason you cannot use one of the two libraries Solr does
already have (Tika's and Google's)? What's so special about that one?

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Tue, Jul 8, 2014 at 5:08 PM, Poornima Jay poornima...@rocketmail.com wrote:
When i use solr-langid-3.5.0.jar file after reloading the core i am getting
the below error

SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException

Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.

Thanks,
Poornima

On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

-- Forwarded message --

From: Poornima Jay poornima...@rocketmail.com
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1

When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and define the path in the solrconfig.xml as below

lib
dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
regex=solr-langid-.*\.jar /

I am getting the below error while reloading the core.

SEVERE: java.lang.NoClassDefFoundError:
com/cybozu/labs/langdetect/DetectorFactory

Please advice.

Thanks,
Poornima

On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:

If you are having troubles with jar location, just use absolute path
in your lib statement and use path, not dir/regex. That will complain
louder. You should be using the latest jar matching the version, they
should be shipped with Solr itself.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr
proficiency

On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay
poornima...@rocketmail.com wrote:
I am facing the issue with the jar file location. Where should i place the
solr-langid-3.6.1.jar. If i place it in the instance folder inside
/lib/solr-langid-3.6.1.jar the language detection class are not loaded.
Should i use solr-langid-3.5.1.jar in solr 3.6.1 version?

Can you please attach the schema file also for reference.

lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar /
lib dir=${user.dir}/../contrib/langid/lib/ /

where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/

Thanks for your time.

Regards,
Poornima

On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

I've had an example in my book:
https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
, though it was for Solr 4.2+. Solr in Action also has a section on
multilingual indexing. There is no generic advice, as everybody seems
to have slightly different multilingual requirements, but the books
will at least discuss the main issues.

Regarding your specific email from a week ago, You haven't actually
said what is the problem was. Just what you did. So, we don't know
where you are stuck and what - specifically - you need help with.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr
proficiency

On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com
wrote:
Hi,

Please let me know if anyone had used google language detection for
implementing multilanguage search in one schema.

Thanks,
Poornima

On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com
wrote:

Hi,

Can anyone please let me know how to integrate
http://code.google.com/p/language-detection/ in solr 3.6.1. I want four
languages (English, chinese simplified, chinese traditional, Japanes, and
Korean) to be added in one schema ie. multilingual search from single
schema
file.

I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
in the solrconfig.xml as below

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

updateRequestProcessorChain name=langid
processor

class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
lst name=invariants
str name=langid.flcontent_eng/str
str name=langid.maptrue/str
str name=langid.map.flcontent_eng,content_ja/str
str name=langid.whitelisten,ja/str
str name=langid.map.lcmapen:english ja:japanese/str
str name=langid.fallbacken/str
/lst
/processor
/updateRequestProcessorChain

requestHandler

don't count facet on blank values

2014-07-08 Thread Aman Tandon

Hi,

Is this possible to not to count the facets for the blank values?
e.g. cat:

cats:[*,34324,*
10,8635,
20,8226,
50,5162,
30,759,
100,188,
40,13,
200,7]

How is this possible?

With Regards
Aman Tandon

Re: Fwd: Language detection for solr 3.6.1

2014-07-08 Thread Poornima Jay

I'm using the google library which I has mentioned in my first mail saying Im
using http://code.google.com/p/language-detection/. I have downloaded the jar
file from the below url

https://www.versioneye.com/java/org.apache.solr:solr-langid/3.6.1

Please let me know from where I need to download the correct jar file.

Regards,
Poornima

On Tuesday, 8 July 2014 3:42 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

I just realized you are not using Solr language detect libraries. You
are using third party one. You did mention that in your first message.

I don't see that library integrated with Solr though, just as a
standalone library. So, you can't just plug in it.

Is there any reason you cannot use one of the two libraries Solr does
already have (Tika's and Google's)? What's so special about that one?

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Tue, Jul 8, 2014 at 5:08 PM, Poornima Jay poornima...@rocketmail.com wrote:
When i use solr-langid-3.5.0.jar file after reloading the core i am getting
the below error

SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException

Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder.

Thanks,
Poornima

On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

-- Forwarded message --

From: Poornima Jay poornima...@rocketmail.com
Date: Tue, Jul 8, 2014 at 5:03 PM
Subject: Re: Language detection for solr 3.6.1

When i try to use solr-langid-3.6.1.jar file in my path
/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
and define the path in the solrconfig.xml as below

lib
dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/
regex=solr-langid-.*\.jar /

I am getting the below error while reloading the core.

SEVERE: java.lang.NoClassDefFoundError:
com/cybozu/labs/langdetect/DetectorFactory

Please advice.

Thanks,
Poornima

On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr
proficiency

Can you please attach the schema file also for reference.

lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar /
lib dir=${user.dir}/../contrib/langid/lib/ /

where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/

Thanks for your time.

Regards,
Poornima

On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr
proficiency

On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com
wrote:
Hi,

Please let me know if anyone had used google language detection for
implementing multilanguage search in one schema.

Thanks,
Poornima

On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com
wrote:

Hi,

I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/
location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes
in the solrconfig.xml as below

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

updateRequestProcessorChain name=langid
processor

Re: don't count facet on blank values

2014-07-08 Thread Alexandre Rafalovitch

Do you need those values stored/indexed? If not, why not remove them
before they hit Solr with appropriate UpdateRequestProcessor?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 8, 2014 at 5:16 PM, Aman Tandon amantandon...@gmail.com wrote:
 Hi,

 Is this possible to not to count the facets for the blank values?
 e.g. cat:

 cats:[*,34324,*
 10,8635,
 20,8226,
 50,5162,
 30,759,
 100,188,
 40,13,
 200,7]

 How is this possible?

 With Regards
 Aman Tandon

Re: don't count facet on blank values

2014-07-08 Thread Gora Mohanty

On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote:
 Hi,

 Is this possible to not to count the facets for the blank values?
 e.g. cat:
[...]

Either filter them out in the query, or remove them client-side when
displaying the results.

Regards,
Gora

Re: don't count facet on blank values

2014-07-08 Thread Aman Tandon

@Alex, yes we need them to indexed and stored, as we are doing some
processing if fields are blank.

@Gora Thanks, i will try this one.

Thanks for your quick replies.

With Regards
Aman Tandon


On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote:
  Hi,
 
  Is this possible to not to count the facets for the blank values?
  e.g. cat:
 [...]

 Either filter them out in the query, or remove them client-side when
 displaying the results.

 Regards,
 Gora

Re: don't count facet on blank values

2014-07-08 Thread Alexandre Rafalovitch

Right, but the blank field and missing field are different things. Are
they for you? If yes, then correct, you are stuck with getting them
back. But if  blank field is the same as missing/empty field, then
you can pre-process unify them.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 8, 2014 at 5:26 PM, Aman Tandon amantandon...@gmail.com wrote:
 @Alex, yes we need them to indexed and stored, as we are doing some
 processing if fields are blank.

 @Gora Thanks, i will try this one.

 Thanks for your quick replies.

 With Regards
 Aman Tandon


 On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote:
  Hi,
 
  Is this possible to not to count the facets for the blank values?
  e.g. cat:
 [...]

 Either filter them out in the query, or remove them client-side when
 displaying the results.

 Regards,
 Gora

Re: Facets on Nested documents

2014-07-08 Thread Walter Liguori

Yes, also i've the same problem.
In my case i have 2 type (parent and children) in a single collection and i
want to retrieve only the parent with a facet on a children field.
I've seen that is possible via block join query (availble by solr 4.5).
I've solr 1.2 and I've thinked about static facet field calculated during
indexing time but i'dont see any guide o reference about it.
Walter

Ing. Walter Liguori


2014-07-07 17:59 GMT+02:00 adfel70 adfe...@gmail.com:

 Hi,

 I indexed different types(different fields) of child docs for every parent.
 I want to do facet on field in one type of child doc and after it to do
 another of facet on different type of child doc. It doesn't work..

 Any idea how i can do something like that?

 thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Facets-on-Nested-documents-tp4145931.html
 Sent from the Solr - User mailing list archive at Nabble.com.

JOB: Solr / Elasticsearch engineer @ Sematext

2014-07-08 Thread Otis Gospodnetic

Hi,

I think most people on this list have heard of Sematext
http://sematext.com/, so I'll skip the company info, and just jump to the
meat, which involves a lot of fun work with Solr and/or Elasticsearch:

We have an opening for an engineer who knows either Elasticsearch or Solr
or both and wants to use these technologies to implement search and
analytics solutions for both Sematext's own products
http://sematext.com/products/ such as SPM http://sematext.com/spm/
(monitoring,
alerting, machine learning-based anomaly detection, etc.) and Logsene
http://sematext.com/logsene/ (logging), as well as for Sematext's clients
http://sematext.com/clients/.

More info at:
* http://blog.sematext.com/2014/07/07/job-elasticsearch-solr-engineer/
* http://sematext.com/about/jobs.html

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/

[ANN] Solr Users Thailand - unofficial group

2014-07-08 Thread Alexandre Rafalovitch

Hello,

A new Google Group has been recently started for Solr Users who want
to discuss Solr in Thai or need to discuss Solr issues around Thai
language (in Thai or English).
https://groups.google.com/forum/#!forum/solr-user-thailand

The group is monitored by the local Solr consultancy, one of Thai
LucidWorks employees and myself. It's just started, but if this
language is of interest to you, please join and help building a
vibrant community.

As mentioned in the subject, this is not an official group. I hope
though it will become active enough over time to be listed next to the
other user groups on the Wiki.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

I need a replacement for the QueryElevation Component

2014-07-08 Thread eShard

Good morning to one and all,
I'm using Solr 4.0 Final and I've been struggling mightily with the
elevation component.
It is too limited for our needs; it doesn't handle phrases very well and I
need to have more than one doc with the same keyword or phrase.
So, I need a better solution. One that allows us to tag the doc with
keywords that clearly identify it as a promoted document would be ideal.
I tried using an external file field but that only allows numbers and not
strings (please correct me if I'm wrong)
EFF would be ideal if there is a way to make it take strings.
I also need an easy way to add these tags to specific docs.
If possible, I would like to avoid creating a separate elevation core but it
may come down to that...

Thank you, 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: don't count facet on blank values

2014-07-08 Thread Aman Tandon

No both are same for me

With Regards
Aman Tandon


On Tue, Jul 8, 2014 at 4:01 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Right, but the blank field and missing field are different things. Are
 they for you? If yes, then correct, you are stuck with getting them
 back. But if  blank field is the same as missing/empty field, then
 you can pre-process unify them.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Tue, Jul 8, 2014 at 5:26 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  @Alex, yes we need them to indexed and stored, as we are doing some
  processing if fields are blank.
 
  @Gora Thanks, i will try this one.
 
  Thanks for your quick replies.
 
  With Regards
  Aman Tandon
 
 
  On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:
 
  On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote:
   Hi,
  
   Is this possible to not to count the facets for the blank values?
   e.g. cat:
  [...]
 
  Either filter them out in the query, or remove them client-side when
  displaying the results.
 
  Regards,
  Gora

Slow inserts when using Solr Cloud

2014-07-08 Thread Ian Williams (NWIS - Applications Design)

Hi

I'm encountering a surprisingly high increase in response times when I insert 
new documents into a SolrCloud, compared with a standalone Solr instance.

I have a SolrCloud set up for test and evaluation purposes.  I have four 
shards, each with a leader and a replica, distributed over four Windows virtual 
servers.  I have zookeeper running on three of the four servers. There are not 
many documents in my SolrCloud (just a few hundred).   I am using composite id 
routing, specifying a prefix to my document ids which is then used by Solr to 
determine which shard the document should be stored on.

I determine in advance which shard a document with a given id prefix will end 
up in, by trying it out in advance.  I then try the following scenarios, using 
inserts without commits.  E.g. I use:
curl http://servername:port/solr/update -H Content-Type: text/xml 
--data-binary @test.txt

1. Insert a document, sending it to the server hosting the correct shard, with 
replicas turned off (response time 20ms)
I find that if I 'switch off' the replicas for my shard (by shutting down Solr 
for the replicas), and then I send the new document to the server hosting the 
leader for the correct shard, then I get a very fast response, i.e. under 10ms, 
which is similar to the performance I get when not using SolrCloud.  This is 
expected, as I've removed any overhead to do with replicas or routing to the 
correct shard.

2. Insert a document, sending it to the server hosting the correct shard, but 
with replicas turned on (response time approx 250ms)
If I switch on the replica for that shard, then my average response time for an 
insert increases from 10ms  to around 250ms.  Now I expect an overhead, 
because the leader has to find out where the replica is (from Zookeeper?) and 
then forward the request to that replica, then wait for a reply - but an 
increase from 20ms to 250ms seems very high?

3. Insert a document, sending it to a server hosting the incorrect shard, with 
replicas turned on (response time approx 500ms)
If I do the same thing again but this time send to the server hosting a 
different shard to the shard my document will end up in, the average response 
times increase again to around 500ms.  Again, I'd expect an increase because of 
the extra step of needing to forward to the correct shard, but the increase 
seems very high?


Should I expect this much of an overhead for shard routing and replicas, or 
might this indicate a problem in my configuration?

Many thanks
Ian

---
Maer wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadaun 
gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod ir anfonwr ai dileun 
ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn 
anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg 
GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a 
defnydd amhriodol. Maen bosibl y bydd y neges e-bost hon ac unrhyw atebion neu 
atodiadau dilynol yn ddarostyngedig ir Ddeddf Rhyddid Gwybodaeth. Maer farn a 
fynegir yn y neges e-bost hon yn perthyn ir anfonwr ac nid ydynt o reidrwydd 
yn perthyn i NWIS.

The information included in this email and any attachments is confidential. If 
received in error, please notify the sender and delete it immediately. 
Disclosure to any party other than the addressee, whether unintentional or 
otherwise, is not intended to waive confidentiality. The NHS Wales Informatics 
Service (NWIS) may monitor and record all emails for viruses and inappropriate 
use. This e-mail and any subsequent replies or attachments may be subject to 
the Freedom of Information Act. The views expressed in this email are those of 
the sender and not necessarily of NWIS.
---

RE: Exact Match first in the list.

2014-07-08 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Thanks shawn, I am already using the Boosting but the OR condition works for me 
as you mentioned.

One question

If I used in search field (TAGs) , it is returning lot of Fields but if try 
with the '( something like TAGs, it is getting less, why the  ( ) are 
changing the results.? They won't take the exact match ..? 

Let me know if I am missing something.

Thanks


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, July 07, 2014 8:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact Match first in the list.

 HI, I HAVE A situation where applying below search rules.

 When I search columns for the full text search. Product Variant 
 Name, the exact match has to be in the first list and other match 
 like , product  or variant or name or any combination will be next in the 
 results.

 Any thoughts, why analyzer or tokenizer or filter need to use.


This is more a matter of boosting than analysis.

If you are using edismax, this is particularly easy. Just put large boost 
values on the fields in the pf parameter, and you'd likely want to use the same 
field list as the qf parameter.

If you are not using edismax and can construct such a query yourself, you can 
boost the phrase over the individual terms. Here's a sample query:

Product Variant Name^10 OR (Product Variant Name)

This is essentially what edismax will do with a boost on the pf values, except 
that it will work with more than one field. The edismax parser is a wonderful 
creation.

Thanks,
Shawn

Re: I need a replacement for the QueryElevation Component

2014-07-08 Thread O. Klein

You can sponsor more then 1 document per keyword.

query text=AAA
  doc id=A /
  doc id=B /
 /query

And you might want to try  str name=queryFieldTypestring/str instead
of another FieldType. I found that textFields remove whitespace and
concatenated the tokens.

Not sure if this is intended or not.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077p4146090.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Slow inserts when using Solr Cloud

2014-07-08 Thread Mark Miller

Updates are currently done locally before concurrently being sent to all 
replicas - so on a single update, you can expect 2x just from that.

As for your results, it sounds like perhaps there is more overhead than we 
would like in the code that sends to replicas and forwards updates? Someone 
would have to dig in to really know I think. I would doubt it’s a configuration 
issue, but you never know.

-- 
Mark Miller
about.me/markrmiller

On July 8, 2014 at 9:18:28 AM, Ian Williams (NWIS - Applications Design) 
(ian.willi...@wales.nhs.uk) wrote:

Hi  

I'm encountering a surprisingly high increase in response times when I insert 
new documents into a SolrCloud, compared with a standalone Solr instance.  

I have a SolrCloud set up for test and evaluation purposes. I have four shards, 
each with a leader and a replica, distributed over four Windows virtual 
servers. I have zookeeper running on three of the four servers. There are not 
many documents in my SolrCloud (just a few hundred). I am using composite id 
routing, specifying a prefix to my document ids which is then used by Solr to 
determine which shard the document should be stored on.  

I determine in advance which shard a document with a given id prefix will end 
up in, by trying it out in advance. I then try the following scenarios, using 
inserts without commits. E.g. I use:  
curl http://servername:port/solr/update -H Content-Type: text/xml 
--data-binary @test.txt  

1. Insert a document, sending it to the server hosting the correct shard, with 
replicas turned off (response time 20ms)  
I find that if I 'switch off' the replicas for my shard (by shutting down Solr 
for the replicas), and then I send the new document to the server hosting the 
leader for the correct shard, then I get a very fast response, i.e. under 10ms, 
which is similar to the performance I get when not using SolrCloud. This is 
expected, as I've removed any overhead to do with replicas or routing to the 
correct shard.  

2. Insert a document, sending it to the server hosting the correct shard, but 
with replicas turned on (response time approx 250ms)  
If I switch on the replica for that shard, then my average response time for an 
insert increases from 10ms to around 250ms. Now I expect an overhead, because 
the leader has to find out where the replica is (from Zookeeper?) and then 
forward the request to that replica, then wait for a reply - but an increase 
from 20ms to 250ms seems very high?  

3. Insert a document, sending it to a server hosting the incorrect shard, with 
replicas turned on (response time approx 500ms)  
If I do the same thing again but this time send to the server hosting a 
different shard to the shard my document will end up in, the average response 
times increase again to around 500ms. Again, I'd expect an increase because of 
the extra step of needing to forward to the correct shard, but the increase 
seems very high?  


Should I expect this much of an overhead for shard routing and replicas, or 
might this indicate a problem in my configuration?  

Many thanks  
Ian  

---  
Mae?r wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadau?n 
gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod i?r anfonwr a?i dileu?n 
ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn 
anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg 
GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a 
defnydd amhriodol. Mae?n bosibl y bydd y neges e-bost hon ac unrhyw atebion neu 
atodiadau dilynol yn ddarostyngedig i?r Ddeddf Rhyddid Gwybodaeth. Mae?r farn a 
fynegir yn y neges e-bost hon yn perthyn i?r anfonwr ac nid ydynt o reidrwydd 
yn perthyn i NWIS.  

The information included in this email and any attachments is confidential. If 
received in error, please notify the sender and delete it immediately. 
Disclosure to any party other than the addressee, whether unintentional or 
otherwise, is not intended to waive confidentiality. The NHS Wales Informatics 
Service (NWIS) may monitor and record all emails for viruses and inappropriate 
use. This e-mail and any subsequent replies or attachments may be subject to 
the Freedom of Information Act. The views expressed in this email are those of 
the sender and not necessarily of NWIS.  
---

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood

You probably do not need to force merge (mistakenly called optimize) your 
index.

Solr does automatic merges, which work just fine.

There are only a few situations where a forced merge is even a good idea. The 
most common one is a replicated (non-cloud) setup with a full reindex every 
night.

If you need Solr Cloud, I cannot think of a situation where you would want a 
forced merge.

wunder

On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote:

 Hi,
 
 Need to optimize index created using CloudSolrServer APIs under SolrCloud
 setup of 3 instances on separate machines. Currently it optimizes
 sequentially if I invoke cloudSolrServer.optimize().
 
 To make it parallel I tried making three separate HttpSolrServer instances
 and invoked httpSolrServer.opimize() on them parallely but still it seems
 to be doing optimization sequentially.
 
 I tried invoking optimize directly using HttpPost with following url and
 parameters but still it seems to be sequential.
 *URL* : http://host:port/solr/collection/update
 
 *Parameters*:
 params.add(new BasicNameValuePair(optimize, true));
 params.add(new BasicNameValuePair(maxSegments, 1));
 params.add(new BasicNameValuePair(waitFlush, true));
 params.add(new BasicNameValuePair(distrib, false));
 
 Kindly provide your suggestion and help.
 
 Regards,
 Modassar

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-08 Thread Damien Dykman


Thanks for your suggestions and recommendations.

If I understand correctly, the MIGRATE command does shard splitting 
(around the range of the split.key) and merging behind the scene. 
Though, it's a bit difficult to properly monitor the actual migration, 
set the proper timeouts, know when to direct indexing and search traffic 
to the destination collection, etc.


Note sure how to MIGRATE an entire collection. By providing the full 
list of split.keys? I'd be surprised if that was doable, but I guess it 
will skip the splitting part, which makes it easier ;-) Or much tougher 
by splitting around all the ranges. More seriously, doing a MERGEINDEX 
at the core level might not be a bad alternative, providing the hash 
ranges are compatible.


Damien

On 07/07/2014 05:14 PM, Shawn Heisey wrote:

I don't think you'd want to disable mmap. It could be done, by choosing
another DirectoryFactory object. Adding memory is likely to be the only
sane way forward.

Another possibility would be to bump up the maxShardsPerNode value and
build the new collection (with the proper number of shards) only on the
new machines... Then when they are built, move them to their proper homes
and manually adjust the cluster state in zookeeper. This will still
generate a lot of I/O, but hopefully it will last for less time on the
wall clock, and it will be something you can do when load is low.

After that done and you've switched to it, you can add replicas with
either the addreplica collections api or with the core admin api. You
should be on the newest Solr version... Lots of bugs have been found and
fixed.

One thing I wonder is whether the MIGRATE api can be used on an entire
collection. It says it works by shard key, but I suspect that most users
will not be using that functionality.

Thanks,
Shawn

SolrCloud delete replica

2014-07-08 Thread Arvin Barooni

Hi,

I have an issue regarding collection delete.
when a solr node is in down mode and I delete a collection, all things
seems fine and it deletes the collection from cluster state too.
But when the dead node comes back it register the collection again.

Even when I delete the collection by DELETEREPLICA collection api, the core
inside the dead node starts to push the collection inside clusterstate.json

What is the true config for SolrCloud, ZooKeeper, the solr node or the
leader?

Is there a way to unload or delete the core in down node, after it becomes
active?

Thanks

Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Chris Hostetter

I think you are missunderstanding what Himanshu is suggesting to you.

You don't need to make lots of big changes ot the internals of solr's code
to get what you want -- instead you can leverage the Atomic Updates
Optimistic Concurrency features of Solr to get the existing internal Solr
to reject any attempts to add a duplicate documentunless the client code
sending the document specifies it should be an update.

This means your client code needs to be a bit more sophisticated, but the
benefit is that you don't have to try to make complex changes to the
internals of Solr that may be impossible and/or difficult to
support/upgrade later.

More details...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency

Simplest possible idea based on the basic info you have given so far...

1) send every doc using _version_=-1
2a) if doc update fails with error 409, that means a version of this doc
already exists
2b) resend just the field changes (using set atomic
operation) and specify _version_=1

: Dear Himanshu,
: Hi,
: You misunderstood what I meant. I am not going to update some field. I am
: going to change what Solr do on duplication of uniquekey field. I dont want
: to solr overwrite Whole document I just want to overwrite some parts of
: document. This situation does not come from user side this is what solr do
: to documents with duplicated uniquekey.
: Regards.
:
:
: On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra
: himanshu.mehro...@snapdeal.com wrote:
:
: Please look at https://wiki.apache.org/solr/Atomic_Updates
:
: This does what you want just update relevant fields.
:
: Thanks,
: Himanshu
:
:
: On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com
: wrote:
:
: Dears,
: Hi,
: According to my requirement I need to change the default behavior of Solr
: for overwriting the whole document on unique-key duplication. I am going
: to
: change that the overwrite just part of document (some fields) and other
: parts of document (other fields) remain unchanged. First of all I need to
: know such changing in Solr behavior is possible? Second, I really
: appreciate if you can guide me through what class/classes should I
: consider
: for changing that?
: Best regards.
:
: --
: A.Nazemian
:
:
:
:
:
: --
: A.Nazemian
:

-Hoss
http://www.lucidworks.com/

Hypen in search keyword

2014-07-08 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

I have the below config for the field type text_general. But then I search with 
keyword e.g 100-001, it get 100-001,  100 in starting records  ending with 001 
. I want to treat - as another character not to split.


fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
filter class=solr.PorterStemFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 catenateWords=1 
catenateNumbers=1 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PorterStemFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 catenateWords=1 
catenateNumbers=1 catenateAll=0/
  
  /analyzer
/fieldType

Thanks

Ravi

Re: Hypen in search keyword

2014-07-08 Thread Jack Krupansky

The word delimiter filter has a types parameter where you specify a file 
that can map hyphen to alpha or numeric.


There is an example in my e-book.

-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Tuesday, July 8, 2014 2:18 PM
To: solr-user@lucene.apache.org
Subject: Hypen in search keyword

I have the below config for the field type text_general. But then I search 
with keyword e.g 100-001, it get 100-001,  100 in starting records  ending 
with 001 . I want to treat - as another character not to split.



fieldType name=text_general class=solr.TextField 
positionIncrementGap=100

 analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /

filter class=solr.PorterStemFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
generateNumberParts=0 catenateWords=1 catenateNumbers=1 
catenateAll=0/


   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PorterStemFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/

   filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
generateNumberParts=0 catenateWords=1 catenateNumbers=1 
catenateAll=0/


 /analyzer
   /fieldType

Thanks

Ravi

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-08 Thread Shawn Heisey

On 7/8/2014 1:53 AM, Harald Kirsch wrote:
 Hi all,

 This is what happens when I run a regular wget query to log the
 current number of documents indexed:

 2014-07-08:07:23:28 QTime=20 numFound=5720168
 2014-07-08:07:24:28 QTime=12 numFound=5721126
 2014-07-08:07:25:28 QTime=19 numFound=5721126
 2014-07-08:07:27:18 QTime=50071 numFound=5721126
 2014-07-08:07:29:08 QTime=50058 numFound=5724494
 2014-07-08:07:30:58 QTime=50033 numFound=5730710
 2014-07-08:07:31:58 QTime=13 numFound=5730710
 2014-07-08:07:33:48 QTime=50065 numFound=5734069
 2014-07-08:07:34:48 QTime=16 numFound=5737742
 2014-07-08:07:36:38 QTime=50037 numFound=5737742
 2014-07-08:07:37:38 QTime=12 numFound=5738190
 2014-07-08:07:38:38 QTime=23 numFound=5741208
 2014-07-08:07:40:29 QTime=50034 numFound=5742067
 2014-07-08:07:41:29 QTime=12 numFound=5742067
 2014-07-08:07:42:29 QTime=17 numFound=5742067
 2014-07-08:07:43:29 QTime=20 numFound=5745497
 2014-07-08:07:44:29 QTime=13 numFound=5745981
 2014-07-08:07:45:29 QTime=23 numFound=5746420

 As you can see, the QTime is just over 50 seconds at irregular intervals.

 This happens independent of whether I am indexing documents with
 around 20 dps or not. First I thought about a dependence on the
 auto-commit of 5 minutes, but the the 50 seconds hits are too irregular.

 Furthermore, and this is *really strange*: when hooking strace on the
 solr process, the 50 seconds QTimes disappear completely and
 consistently --- a real Heisenbug.

 Nevertheless, strace shows that there is a socket timeout of 50
 seconds defined in calls like this:

 [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40

 where the fd=96 is the result of

 [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
 sin_port=htons(57236), sin_addr=inet_addr(ip address of local
 host)}, [16]) = 96 0.54

 where again fd=122 is the TCP port on which solr was started.

 My hunch is that this is communication between the cores of solr.

 I tried to search the internet for such a strange connection between
 socket timeouts and strace, but could not find anything (the
 stackoverflow entry from yesterday is my own :-(


 This smells a bit like a race condition/deadlock kind of thing which
 is broken up by timing differences introduced by stracing the process.

 Any hints appreciated.

 For completeness, here is my setup:
 - solr-4.8.1,
 - cloud version running
 - 10 shards on 10 cores in one instance
 - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
 PATCHLEVEL 2
 - hosted on a vmware, 4 CPU cores, 16 GB RAM
 - single digit million docs indexed, exact number does not matter
 - zero query load

Long GC pauses would also be my first guess.  DNS problems on the
inter-server communication for SolrCloud would be a second guess.  If
it's not one of these, then I really have no idea.

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
http://serverfault.com/questions/339791/5-second-resolving-delay

Thanks,
Shawn

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-08 Thread Walter Underwood

Local disks or shared network disks?  --wunder


On Jul 8, 2014, at 11:43 AM, Shawn Heisey s...@elyograg.org wrote:

 On 7/8/2014 1:53 AM, Harald Kirsch wrote:
 Hi all,
 
 This is what happens when I run a regular wget query to log the
 current number of documents indexed:
 
 2014-07-08:07:23:28 QTime=20 numFound=5720168
 2014-07-08:07:24:28 QTime=12 numFound=5721126
 2014-07-08:07:25:28 QTime=19 numFound=5721126
 2014-07-08:07:27:18 QTime=50071 numFound=5721126
 2014-07-08:07:29:08 QTime=50058 numFound=5724494
 2014-07-08:07:30:58 QTime=50033 numFound=5730710
 2014-07-08:07:31:58 QTime=13 numFound=5730710
 2014-07-08:07:33:48 QTime=50065 numFound=5734069
 2014-07-08:07:34:48 QTime=16 numFound=5737742
 2014-07-08:07:36:38 QTime=50037 numFound=5737742
 2014-07-08:07:37:38 QTime=12 numFound=5738190
 2014-07-08:07:38:38 QTime=23 numFound=5741208
 2014-07-08:07:40:29 QTime=50034 numFound=5742067
 2014-07-08:07:41:29 QTime=12 numFound=5742067
 2014-07-08:07:42:29 QTime=17 numFound=5742067
 2014-07-08:07:43:29 QTime=20 numFound=5745497
 2014-07-08:07:44:29 QTime=13 numFound=5745981
 2014-07-08:07:45:29 QTime=23 numFound=5746420
 
 As you can see, the QTime is just over 50 seconds at irregular intervals.
 
 This happens independent of whether I am indexing documents with
 around 20 dps or not. First I thought about a dependence on the
 auto-commit of 5 minutes, but the the 50 seconds hits are too irregular.
 
 Furthermore, and this is *really strange*: when hooking strace on the
 solr process, the 50 seconds QTimes disappear completely and
 consistently --- a real Heisenbug.
 
 Nevertheless, strace shows that there is a socket timeout of 50
 seconds defined in calls like this:
 
 [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40
 
 where the fd=96 is the result of
 
 [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
 sin_port=htons(57236), sin_addr=inet_addr(ip address of local
 host)}, [16]) = 96 0.54
 
 where again fd=122 is the TCP port on which solr was started.
 
 My hunch is that this is communication between the cores of solr.
 
 I tried to search the internet for such a strange connection between
 socket timeouts and strace, but could not find anything (the
 stackoverflow entry from yesterday is my own :-(
 
 
 This smells a bit like a race condition/deadlock kind of thing which
 is broken up by timing differences introduced by stracing the process.
 
 Any hints appreciated.
 
 For completeness, here is my setup:
 - solr-4.8.1,
 - cloud version running
 - 10 shards on 10 cores in one instance
 - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
 PATCHLEVEL 2
 - hosted on a vmware, 4 CPU cores, 16 GB RAM
 - single digit million docs indexed, exact number does not matter
 - zero query load
 
 Long GC pauses would also be my first guess.  DNS problems on the
 inter-server communication for SolrCloud would be a second guess.  If
 it's not one of these, then I really have no idea.
 
 http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems
 http://serverfault.com/questions/339791/5-second-resolving-delay
 
 Thanks,
 Shawn

SOLR Talk at AOL Dulles Campus.

2014-07-08 Thread Rishi Easwaran

All, 
There is a tech talk on AOL Dulles campus tomorrow. Do swing by if you can and 
share it with your colleagues and friends. 
www.meetup.com/Code-Brew/events/192361672/
There will be free food and beer served at this event :)

Thanks,
Rishi.

RE: [Solr Schema API] SolrJ Access

2014-07-08 Thread Cario, Elaine

Alessandro,

I just got this to work myself:

public static final String DEFINED_FIELDS_API = /schema/fields;
public static final String DYNAMIC_FIELDS_API = /schema/dynamicfields;
...
// just get a connection to Solr as usual (the factory is mine - it 
will use CloudSolrServer or HttpSolrServer depending on if we're using 
SolrCloud or not)
SolrClient client = 
SolrClientFactory.getSolrClientInstance(CLOUD_ENABLED);
SolrServer solrConn = client.getConnection(SOLR_URL, collection);

SolrQuery query = new SolrQuery();
if (dynamicFields)
query.setRequestHandler(DYNAMIC_FIELDS_API);
else
query.setRequestHandler(DEFINED_FIELDS_API);
query.setParam(showDefaults, true);

QueryResponse response = solrConn.query(query)

Then you've got to parse the response using NamedList etc.etc.

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Tuesday, July 08, 2014 5:54 AM
To: solr-user@lucene.apache.org
Subject: [Solr Schema API] SolrJ Access

Hi guys,
wondering if there is any proper way to access Schema API via Solrj.

Of course is possible to reach them in Java with a specific Http Request, but 
in this way, using SolrCloud for example we become coupled to one specific 
instance ( and we don't want) .

Code Example :

HttpResponse httpResponse;
 String url=this.solrBase+/+core+ 
 SCHEMA_SOLR_FIELDS_ENDPOINT
 +fieldName;
 HttpPut httpPut = new HttpPut(url);
 StringEntity entity = new StringEntity(
 {\type\:\text_general\,\stored\:\true\} ,
 ContentType.APPLICATION_JSON);
  httpPut.setEntity( entity );
  HttpClient client=new DefaultHttpClient();
  response = client.execute(httpPut);


Any suggestion ?
In my opinion should be interesting to have some auxiliary method in SolrServer 
if it's not there yet.

Cheers

--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Solr atomic updates question

2014-07-08 Thread Bill Au

Solr atomic update allows for changing only one or more fields of a
document without having to re-index the entire document.  But what about
the case where I am sending in the entire document?  In that case the whole
document will be re-indexed anyway, right?  So I assume that there will be
no saving.  I am actually thinking that there will be a performance penalty
since atomic update requires Solr to first retrieve all the fields first
before updating.

Bill

What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton

Hello there,

I'm using a project named LIRE for image retrieval based on sole platform.
There is part of the code which i can't understand, so maybe you could help
me.

The project implements request handler named lireq :
public class LireRequestHandler extends RequestHandlerBase

The search method in this handler is computed from lucene search +
reranking.
The first part goes like this :
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
...
BooleanQuery query = new BooleanQuery();
for (int i = 0; i  numHashes; i++) {
query.add(new BooleanClause(new TermQuery(new Term(paramField,
Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD));
}

SolrIndexSearcher searcher = req.getSearcher()
TopDocs docs = searcher.search(query, candidateResultNumber);

Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-08 Thread Steve McKay

Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts 
behaving strangely in a socket-related way. Knowing exactly what's happening at 
the transport level is worth a month of guessing and poking.

On Jul 8, 2014, at 3:53 AM, Harald Kirsch harald.kir...@raytion.com wrote:

 Hi all,
 
 This is what happens when I run a regular wget query to log the current 
 number of documents indexed:
 
 2014-07-08:07:23:28 QTime=20 numFound=5720168
 2014-07-08:07:24:28 QTime=12 numFound=5721126
 2014-07-08:07:25:28 QTime=19 numFound=5721126
 2014-07-08:07:27:18 QTime=50071 numFound=5721126
 2014-07-08:07:29:08 QTime=50058 numFound=5724494
 2014-07-08:07:30:58 QTime=50033 numFound=5730710
 2014-07-08:07:31:58 QTime=13 numFound=5730710
 2014-07-08:07:33:48 QTime=50065 numFound=5734069
 2014-07-08:07:34:48 QTime=16 numFound=5737742
 2014-07-08:07:36:38 QTime=50037 numFound=5737742
 2014-07-08:07:37:38 QTime=12 numFound=5738190
 2014-07-08:07:38:38 QTime=23 numFound=5741208
 2014-07-08:07:40:29 QTime=50034 numFound=5742067
 2014-07-08:07:41:29 QTime=12 numFound=5742067
 2014-07-08:07:42:29 QTime=17 numFound=5742067
 2014-07-08:07:43:29 QTime=20 numFound=5745497
 2014-07-08:07:44:29 QTime=13 numFound=5745981
 2014-07-08:07:45:29 QTime=23 numFound=5746420
 
 As you can see, the QTime is just over 50 seconds at irregular intervals.
 
 This happens independent of whether I am indexing documents with around 20 
 dps or not. First I thought about a dependence on the auto-commit of 5 
 minutes, but the the 50 seconds hits are too irregular.
 
 Furthermore, and this is *really strange*: when hooking strace on the solr 
 process, the 50 seconds QTimes disappear completely and consistently --- a 
 real Heisenbug.
 
 Nevertheless, strace shows that there is a socket timeout of 50 seconds 
 defined in calls like this:
 
 [pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) 
 = 1 ([{fd=96, revents=POLLIN}]) 0.40
 
 where the fd=96 is the result of
 
 [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, 
 sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) 
 = 96 0.54
 
 where again fd=122 is the TCP port on which solr was started.
 
 My hunch is that this is communication between the cores of solr.
 
 I tried to search the internet for such a strange connection between socket 
 timeouts and strace, but could not find anything (the stackoverflow entry 
 from yesterday is my own :-(
 
 
 This smells a bit like a race condition/deadlock kind of thing which is 
 broken up by timing differences introduced by stracing the process.
 
 Any hints appreciated.
 
 For completeness, here is my setup:
 - solr-4.8.1,
 - cloud version running
 - 10 shards on 10 cores in one instance
 - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2
 - hosted on a vmware, 4 CPU cores, 16 GB RAM
 - single digit million docs indexed, exact number does not matter
 - zero query load
 
 
 Harald.

Re: What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton

(Sorry - my mail was sent half ready)

hashes is an array of hash values generated some-how from the image.

So my question is what is the query being done in this part ?
I tried to reconstruct it by my own, by constructing select query with the
hash values seperated by OR but the results were different.
Any one can tell me why ?

This where the source code is : http://code.google.com/p/lire/



On Wed, Jul 9, 2014 at 1:29 AM, Yossi Biton yossibi...@gmail.com wrote:

 Hello there,

 I'm using a project named LIRE for image retrieval based on sole platform.
 There is part of the code which i can't understand, so maybe you could
 help me.

 The project implements request handler named lireq :
 public class LireRequestHandler extends RequestHandlerBase

 The search method in this handler is computed from lucene search +
 reranking.
 The first part goes like this :
 public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
 throws Exception {
 ...
 BooleanQuery query = new BooleanQuery();
 for (int i = 0; i  numHashes; i++) {
 query.add(new BooleanClause(new TermQuery(new Term(paramField,
 Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD));
 }

 SolrIndexSearcher searcher = req.getSearcher()
 TopDocs docs = searcher.search(query, candidateResultNumber);




-- 

יוסי

Re: Solr atomic updates question

2014-07-08 Thread Steve McKay

Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched 
doc, then reindex. Whether you use atomic updates or send the entire doc to 
Solr, it has to deleteById then add. The perf difference between the atomic 
updates and normal updates is likely minimal.

Atomic updates are for when you have changes and want to apply them to a 
document without affecting the other fields. A regular add will replace an 
existing document completely. AFAIK Solr will let you mix atomic updates with 
regular field values, but I don't think it's a good idea.

Steve

On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote:

 Solr atomic update allows for changing only one or more fields of a
 document without having to re-index the entire document.  But what about
 the case where I am sending in the entire document?  In that case the whole
 document will be re-indexed anyway, right?  So I assume that there will be
 no saving.  I am actually thinking that there will be a performance penalty
 since atomic update requires Solr to first retrieve all the fields first
 before updating.
 
 Bill

Re: Solr atomic updates question

2014-07-08 Thread Bill Au

Thanks for that under-the-cover explanation.

I am not sure what you mean by mix atomic updates with regular field
values.  Can you give an example?

Thanks.

Bill


On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote:

 Atomic updates fetch the doc with RealTimeGet, apply the updates to the
 fetched doc, then reindex. Whether you use atomic updates or send the
 entire doc to Solr, it has to deleteById then add. The perf difference
 between the atomic updates and normal updates is likely minimal.

 Atomic updates are for when you have changes and want to apply them to a
 document without affecting the other fields. A regular add will replace an
 existing document completely. AFAIK Solr will let you mix atomic updates
 with regular field values, but I don't think it's a good idea.

 Steve

 On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote:

  Solr atomic update allows for changing only one or more fields of a
  document without having to re-index the entire document.  But what about
  the case where I am sending in the entire document?  In that case the
 whole
  document will be re-indexed anyway, right?  So I assume that there will
 be
  no saving.  I am actually thinking that there will be a performance
 penalty
  since atomic update requires Solr to first retrieve all the fields first
  before updating.
 
  Bill

Re: Solr atomic updates question

2014-07-08 Thread Steve McKay

Take a look at this update XML:

add
  doc
field name=employeeId05991/field
field name=employeeNameSteve McKay/field
field name=office update=setWalla Walla/field
field name=skills update=addPython/field
  /doc
/add

Let's say employeeId is the key. If there's a fourth field, salary, on the 
existing doc, should it be deleted or retained? With this update it will 
obviously be deleted:

add
  doc
field name=employeeId05991/field
field name=employeeNameSteve McKay/field
  /doc
/add

With this XML it will be retained:

add
  doc
field name=employeeId05991/field
field name=office update=setWalla Walla/field
field name=skills update=addPython/field
  /doc
/add

I'm not willing to guess what will happen in the case where non-atomic and 
atomic updates are present on the same add because I haven't looked at that 
code since 4.0, but I think I could make a case for retaining salary or for 
discarding it. That by itself reeks--and it's also not well documented. Relying 
on iffy, poorly-documented behavior is asking for pain at upgrade time.

Steve

On Jul 8, 2014, at 7:02 PM, Bill Au bill.w...@gmail.com wrote:

 Thanks for that under-the-cover explanation.
 
 I am not sure what you mean by mix atomic updates with regular field
 values.  Can you give an example?
 
 Thanks.
 
 Bill
 
 
 On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote:
 
 Atomic updates fetch the doc with RealTimeGet, apply the updates to the
 fetched doc, then reindex. Whether you use atomic updates or send the
 entire doc to Solr, it has to deleteById then add. The perf difference
 between the atomic updates and normal updates is likely minimal.
 
 Atomic updates are for when you have changes and want to apply them to a
 document without affecting the other fields. A regular add will replace an
 existing document completely. AFAIK Solr will let you mix atomic updates
 with regular field values, but I don't think it's a good idea.
 
 Steve
 
 On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote:
 
 Solr atomic update allows for changing only one or more fields of a
 document without having to re-index the entire document.  But what about
 the case where I am sending in the entire document?  In that case the
 whole
 document will be re-indexed anyway, right?  So I assume that there will
 be
 no saving.  I am actually thinking that there will be a performance
 penalty
 since atomic update requires Solr to first retrieve all the fields first
 before updating.
 
 Bill

Re: Solr atomic updates question

2014-07-08 Thread Bill Au

I see what you mean now.  Thanks for the example.  It makes things very
clear.

I have been thinking about the explanation in the original response more.
 According to that, both regular update with entire doc and atomic update
involves a delete by id followed by a add.  But both the Solr reference doc
(
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
says that:

The first is *atomic updates*. This approach allows changing only one or
more fields of a document without having to re-index the entire document.

But since Solr is doing a delete by id followed by a add, so without
having to re-index the entire document apply to the client side only?  On
the server side the add means that the entire document is re-indexed, right?

Bill


On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay st...@b.abbies.us wrote:

 Take a look at this update XML:

 add
   doc
 field name=employeeId05991/field
 field name=employeeNameSteve McKay/field
 field name=office update=setWalla Walla/field
 field name=skills update=addPython/field
   /doc
 /add

 Let's say employeeId is the key. If there's a fourth field, salary, on the
 existing doc, should it be deleted or retained? With this update it will
 obviously be deleted:

 add
   doc
 field name=employeeId05991/field
 field name=employeeNameSteve McKay/field
   /doc
 /add

 With this XML it will be retained:

 add
   doc
 field name=employeeId05991/field
 field name=office update=setWalla Walla/field
 field name=skills update=addPython/field
   /doc
 /add

 I'm not willing to guess what will happen in the case where non-atomic and
 atomic updates are present on the same add because I haven't looked at that
 code since 4.0, but I think I could make a case for retaining salary or for
 discarding it. That by itself reeks--and it's also not well documented.
 Relying on iffy, poorly-documented behavior is asking for pain at upgrade
 time.

 Steve

 On Jul 8, 2014, at 7:02 PM, Bill Au bill.w...@gmail.com wrote:

  Thanks for that under-the-cover explanation.
 
  I am not sure what you mean by mix atomic updates with regular field
  values.  Can you give an example?
 
  Thanks.
 
  Bill
 
 
  On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote:
 
  Atomic updates fetch the doc with RealTimeGet, apply the updates to the
  fetched doc, then reindex. Whether you use atomic updates or send the
  entire doc to Solr, it has to deleteById then add. The perf difference
  between the atomic updates and normal updates is likely minimal.
 
  Atomic updates are for when you have changes and want to apply them to a
  document without affecting the other fields. A regular add will replace
 an
  existing document completely. AFAIK Solr will let you mix atomic updates
  with regular field values, but I don't think it's a good idea.
 
  Steve
 
  On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote:
 
  Solr atomic update allows for changing only one or more fields of a
  document without having to re-index the entire document.  But what
 about
  the case where I am sending in the entire document?  In that case the
  whole
  document will be re-indexed anyway, right?  So I assume that there will
  be
  no saving.  I am actually thinking that there will be a performance
  penalty
  since atomic update requires Solr to first retrieve all the fields
 first
  before updating.
 
  Bill

fix wiki error

2014-07-08 Thread Susmit Shukla

The url for solr atomic update documentation should contain json in the end.
Here is the page -
https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

curl http://localhost:8983/solr/update/*json* -H 'Content-type:application/json'

Re: fix wiki error

2014-07-08 Thread Alexandre Rafalovitch

Why do you think so?

As of Solr 4, the CSV and JSON handlers have been unified in the
general update handler and the /update/json is there for legacy
reason.

The example should work. If it is not for you, it might be a different reason.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Wed, Jul 9, 2014 at 9:56 AM, Susmit Shukla shukla.sus...@gmail.com wrote:
 The url for solr atomic update documentation should contain json in the end.
 Here is the page -
 https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

 curl http://localhost:8983/solr/update/*json* -H 
 'Content-type:application/json'

Add a new replica to SolrCloud

2014-07-08 Thread Varun Gupta

Hi,

I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
servers with number of shards as 2, replication factor as 2 and mas shards
per node as 4.

Now, I want to add another server to the SolrCloud as a replica. I can see
Collection API to add a new replica but that was added in Solr 4.8. Is
there some way to add a new replica in Solr 4.7.2?

--
Thanks
Varun Gupta

Synchronising two masters

2014-07-08 Thread Prasi S

Hi ,
Our solr setup consists of 2 Masters and 2Slaves. The slaves would point to
any one of the Masters through a load balancer and replicate the data.

Master1(M1) is the primary indexer. I send data to M1. In case M1 fails, i
have a failover master, M2 and that would be indexing the data. The problem
is, once the Master1 comes up, how to synchornize M1 and M2? SolrCloud
would the option rather that going with this setup. But, currently we want
it to be implemented in Master-Slave mode.

Any suggestions?
Thanks,
Prasi

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather

Thanks Walter for your inputs.

Our use case and performance benchmark requires us to invoke optimize.

Here we see a chance of improvement in performance of optimize() if invoked
in parallel.
I found that if* distrib=false *is used, the optimization will happen in
parallel.

But I could not find a way to set it using HttpSolrServer/CloudSolrServer.
Also with the parameter setting as given in my mail above does not seems to
work.

Please let me know in what ways I can achieve the parallel optimize on
SolrCloud.

Thanks,
Modassar



On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
wrote:

 You probably do not need to force merge (mistakenly called optimize)
 your index.

 Solr does automatic merges, which work just fine.

 There are only a few situations where a forced merge is even a good idea.
 The most common one is a replicated (non-cloud) setup with a full reindex
 every night.

 If you need Solr Cloud, I cannot think of a situation where you would want
 a forced merge.

 wunder

 On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote:

  Hi,
 
  Need to optimize index created using CloudSolrServer APIs under SolrCloud
  setup of 3 instances on separate machines. Currently it optimizes
  sequentially if I invoke cloudSolrServer.optimize().
 
  To make it parallel I tried making three separate HttpSolrServer
 instances
  and invoked httpSolrServer.opimize() on them parallely but still it seems
  to be doing optimization sequentially.
 
  I tried invoking optimize directly using HttpPost with following url and
  parameters but still it seems to be sequential.
  *URL* : http://host:port/solr/collection/update
 
  *Parameters*:
  params.add(new BasicNameValuePair(optimize, true));
  params.add(new BasicNameValuePair(maxSegments, 1));
  params.add(new BasicNameValuePair(waitFlush, true));
  params.add(new BasicNameValuePair(distrib, false));
 
  Kindly provide your suggestion and help.
 
  Regards,
  Modassar

Planning ahead for Solr Cloud and Scaling

2014-07-08 Thread Zane Rockenbaugh

I'm working on a product hosted with AWS that uses Elastic Beanstalk
auto-scaling to good effect and we are trying to set up similar (more or
less) runtime scaling support with Solr. I think I understand how to set
this up, and wanted to check I was on the right track.

We currently run 3 cores on a single host / Solr server / shard. This is
just fine for now, and we have overhead for the near future. However, I
need to have a plan, and then test, for a higher capacity future.

1) I gather that if I set up SolrCloud, and then later load increases, I
can spin up a second host / Solr server, create a new shard, and then split
the first shard:

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

And doing this, we no longer have to commit to shards out of the gate.

2) I'm not clear whether there's a big advantage splitting up the cores or
not. Two of the three cores will have about the same number of documents,
though only one contains large amounts of text. The third core is much
smaller in both bytes and documents (2 orders of magnitude).

3) We are also looking at moving multi-lingual. The current plan is to
store the localized text in fields within the same core. The languages will
be added over time. We can update the schema (as each will be optional).
This seems easier than adding a core for each language. Is there a downside?

Thanks for any pointers.

Re: Add a new replica to SolrCloud

2014-07-08 Thread Shalin Shekhar Mangar

Yes, you can just call a Core Admin CREATE on the new node with the
collection name and optionally the shard name.


On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta varun.vgu...@gmail.com wrote:

 Hi,

 I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
 servers with number of shards as 2, replication factor as 2 and mas shards
 per node as 4.

 Now, I want to add another server to the SolrCloud as a replica. I can see
 Collection API to add a new replica but that was added in Solr 4.8. Is
 there some way to add a new replica in Solr 4.7.2?

 --
 Thanks
 Varun Gupta




-- 
Regards,
Shalin Shekhar Mangar.

Re: Add a new replica to SolrCloud

2014-07-08 Thread Himanshu Mehrotra

Yes, there is a way.

One node on which replica needs to be created hit

curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=corenamecollection=collectionshard=
http://localhost:8983/solr/admin/cores?action=CREATEname=mycorecollection=collection1shard=shard2
shardid'

For example

curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=mycorecollection=collection1shard=shard2
'


see http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin
for details.


Thanks,

Himanshu



On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta varun.vgu...@gmail.com wrote:

 Hi,

 I am currently using Solr 4.7.2 and have SolrCloud setup running on 2
 servers with number of shards as 2, replication factor as 2 and mas shards
 per node as 4.

 Now, I want to add another server to the SolrCloud as a replica. I can see
 Collection API to add a new replica but that was added in Solr 4.8. Is
 there some way to add a new replica in Solr 4.7.2?

 --
 Thanks
 Varun Gupta

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood

I seriously doubt that you are required to force merge.

How much improvement? And is the big performance cost also OK?

I have worked on search engines that do automatic merges and offer forced 
merges for over fifteen years. For all that time, forced merges have usually 
caused problems.

Stop doing forced merges.

wunder

On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com wrote:

 Thanks Walter for your inputs.
 
 Our use case and performance benchmark requires us to invoke optimize.
 
 Here we see a chance of improvement in performance of optimize() if invoked
 in parallel.
 I found that if* distrib=false *is used, the optimization will happen in
 parallel.
 
 But I could not find a way to set it using HttpSolrServer/CloudSolrServer.
 Also with the parameter setting as given in my mail above does not seems to
 work.
 
 Please let me know in what ways I can achieve the parallel optimize on
 SolrCloud.
 
 Thanks,
 Modassar
 
 On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
 wrote:
 
 You probably do not need to force merge (mistakenly called optimize)
 your index.
 
 Solr does automatic merges, which work just fine.
 
 There are only a few situations where a forced merge is even a good idea.
 The most common one is a replicated (non-cloud) setup with a full reindex
 every night.
 
 If you need Solr Cloud, I cannot think of a situation where you would want
 a forced merge.
 
 wunder
 
 On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote:
 
 Hi,
 
 Need to optimize index created using CloudSolrServer APIs under SolrCloud
 setup of 3 instances on separate machines. Currently it optimizes
 sequentially if I invoke cloudSolrServer.optimize().
 
 To make it parallel I tried making three separate HttpSolrServer
 instances
 and invoked httpSolrServer.opimize() on them parallely but still it seems
 to be doing optimization sequentially.
 
 I tried invoking optimize directly using HttpPost with following url and
 parameters but still it seems to be sequential.
 *URL* : http://host:port/solr/collection/update
 
 *Parameters*:
 params.add(new BasicNameValuePair(optimize, true));
 params.add(new BasicNameValuePair(maxSegments, 1));
 params.add(new BasicNameValuePair(waitFlush, true));
 params.add(new BasicNameValuePair(distrib, false));
 
 Kindly provide your suggestion and help.
 
 Regards,
 Modassar
 
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org

Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather

Our index has almost 100M documents running on SolrCloud of 3 shards and
each shard has an index size of about 700GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but it does not work well.

Kindly provide your suggestion.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood wun...@wunderwood.org
wrote:

 I seriously doubt that you are required to force merge.

 How much improvement? And is the big performance cost also OK?

 I have worked on search engines that do automatic merges and offer forced
 merges for over fifteen years. For all that time, forced merges have
 usually caused problems.

 Stop doing forced merges.

 wunder

 On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com
 wrote:

  Thanks Walter for your inputs.
 
  Our use case and performance benchmark requires us to invoke optimize.
 
  Here we see a chance of improvement in performance of optimize() if
 invoked
  in parallel.
  I found that if* distrib=false *is used, the optimization will happen in
  parallel.
 
  But I could not find a way to set it using
 HttpSolrServer/CloudSolrServer.
  Also with the parameter setting as given in my mail above does not seems
 to
  work.
 
  Please let me know in what ways I can achieve the parallel optimize on
  SolrCloud.
 
  Thanks,
  Modassar
 
  On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
  wrote:
 
  You probably do not need to force merge (mistakenly called optimize)
  your index.
 
  Solr does automatic merges, which work just fine.
 
  There are only a few situations where a forced merge is even a good
 idea.
  The most common one is a replicated (non-cloud) setup with a full
 reindex
  every night.
 
  If you need Solr Cloud, I cannot think of a situation where you would
 want
  a forced merge.
 
  wunder
 
  On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com
 wrote:
 
  Hi,
 
  Need to optimize index created using CloudSolrServer APIs under
 SolrCloud
  setup of 3 instances on separate machines. Currently it optimizes
  sequentially if I invoke cloudSolrServer.optimize().
 
  To make it parallel I tried making three separate HttpSolrServer
  instances
  and invoked httpSolrServer.opimize() on them parallely but still it
 seems
  to be doing optimization sequentially.
 
  I tried invoking optimize directly using HttpPost with following url
 and
  parameters but still it seems to be sequential.
  *URL* : http://host:port/solr/collection/update
 
  *Parameters*:
  params.add(new BasicNameValuePair(optimize, true));
  params.add(new BasicNameValuePair(maxSegments, 1));
  params.add(new BasicNameValuePair(waitFlush, true));
  params.add(new BasicNameValuePair(distrib, false));
 
  Kindly provide your suggestion and help.
 
  Regards,
  Modassar
 
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org

56 matches

Mail list logo