Re: Is it OK to have very big number of fields in solr/lucene ?

2014-08-11 Thread Lisheng Zhang
Thanks for helps!

The use case is that there is a file to which different users have
different accessment, if only for filtering I can create one field, like
status_field, with value like

user1_important user2_important user3_unimportant ...

then using a filter to get all files which is important to user1, like
status_field:user1_important

But tough part is that we may need to sort files according to such flag,
for each user (I should have mentioned in last mail). My solution is to add
many fields to file document, like

user1_status, user2_status, user3_status // value can be important or
unimportant

then I can sort files according to each user's accessment.

My concern is that with too many fields it would not scale well ?

Best regards, Lisheng




On Sat, Aug 9, 2014 at 6:18 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Lisheng Zhang [lz0522...@gmail.com] wrote:
  In our application there are many complicated filter conditions, very
 often
  those conditions are special to each user (like whether or not a doc is
  important or already read by a user ..), two possible solutions to
  implement those filters in lucene:

  1/ create many fields
  2/ create many collections (for each user, for example)

 3) Define a few fields or even just a single field where users can provide
 their special tags.

 misc_field:important
 misc_field:read
 misc_field:personal_letters
 misc_field:todo
 misc_field:very_important

 Chances are that there will be a lot of overlap of terms between the users
 (if not, please describe in more detail what the user-specific things are),
 so that the filter caches can be re-used between them.

 - Toke Eskildsen



Re: what os env you use to develop lucene or solr?

2014-08-11 Thread Toke Eskildsen
On Mon, 2014-08-11 at 03:49 +0200, rulinma wrote:
   I want know this, if linux is the best choosen?

The only special OS-thing about Lucene/Solr is that you should use a
64-bit OS for proper memory mapping.

With that out of the way, the question becomes What OS should one use
for developing and that is extremely dependent on taste and context. 

The three obvious choices are Windows, OS X and a Linux variant. I would
argue that OS X and Linux are fairly similar when developing in Java as
file locking and scripting works the same way: If your server is OS X or
Linux, using either OS X or Linux for development makes for easier
testing. Likewise, if your server runs Windows, you might catch some
errors earlier by developing under Windows.

- Toke Eskildsen, State and University Library, Denmark




Re: what os env you use to develop lucene or solr?

2014-08-11 Thread Paul Libbrecht
I use MacOSX for development since more than 10 years.
It's, by far, the user-friendliest Unix-based system.

So copy and paste works correctly from the terminal to the IDE.
Find in the terminal is nicely behaving (really!).
This is kilometers away from XWindows' terminals and megameters away from the 
DOS shell (I have indeed a limited experience in both things).

Lucene and solr work flawless on all such systems. Beware not to use network 
disks for the index file-system.

paul




On 11 août 2014, at 03:49, rulinma ruli...@gmail.com wrote:

 HI
  everybody,
 
  I want know this, if linux is the best choosen? and Doug Cutting use what,
 and centos or ubuntu or others, even mac?
 
  thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/what-os-env-you-use-to-develop-lucene-or-solr-tp4152219.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to grab matching stats in Similarity class

2014-08-11 Thread Varun Thacker
On Thu, Aug 7, 2014 at 6:10 AM, Hafiz Mian M Hamid 
mianhami...@yahoo.com.invalid wrote:

 We're using solr 4.2.1 and use an extension of Lucene's DefaultSimilarity
 as our similarity class. I am trying to figure out how we could get hold of
 the matching stats (i.e. how many/which terms in the query matched on
 different fields in the retrieved document set) in our similarity class
 since we want to add some custom boost to our scoring function. The scoring
 logic needs to know the number of terms matched on each field in the query
 to determine the boost value.

 The score is calculated on a per field basis. Hence the similarity will
never know how many fields the term match against.

In Solr if you are using eDismax (
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
)
then the same term is searched across all the fields individually and then
the best score from the highest scoring field is taken. This could solve
your custom logic part in a crude way.


 Basically we want our similarity class to be aware of the global matching
 stats even for scoring a single term in it's TFIDFDocScorer.score() method.
 I was wondering how we could get hold of that information. It looks like
 the exactSimScorer() and sloppySimScorer() methods get an instance of
 AtomicReaderContext as second parameter but it doesn't look like we could
 retrieve matching stats from this object. Is there any other way we could
 make the similarity class aware of the global matching stats?

 I'd highly appreciate any help.

 Thanks,
 Hamid




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


Re: Is it OK to have very big number of fields in solr/lucene ?

2014-08-11 Thread Toke Eskildsen
On Mon, 2014-08-11 at 09:44 +0200, Lisheng Zhang wrote:

[...]

 But tough part is that we may need to sort files according to such flag,
 for each user (I should have mentioned in last mail). My solution is to add
 many fields to file document, like
 
 user1_status, user2_status, user3_status // value can be important or
 unimportant
 
 then I can sort files according to each user's accessment.

Flip assessment and user name:

status:important_user1
status:unimportant_user2
status:customflag_user1
status:anothercustomflag_user2

That way sorting will work. Unfortunately there will be no re-use of the
filter cache between users, so you might want to disable filter caching.

If you also need to be able to filter on which documents a user has
assessed, then add a second field for that

assessed_by:user1
assessed_by:user2

 My concern is that with too many fields it would not scale well ?

I have successfully experimented with 10.000 fields, but I doubt that it
will work with millions.

- Toke Eskildsen, State and University Library, Denmark




Solr search \ special cases

2014-08-11 Thread Shay Sofer
Hi,

I have some strange cases while search with Solr.

I have doc with names like: rule #22, rule +33, rule %44.

When search for #22 or %55 or +33 Solr bring me as expected:  rule #22 and rule 
+33 and rule %44.

But when appending star (*) to each search (#22*, +33*, %55*), just the one 
with + sign bring rule +33, all other result none.

Can someone explain?

Thanks,
Shay.


Re: Solr search \ special cases

2014-08-11 Thread Harshvardhan Ojha
Hi Shay,

I believe + is treated as space, is it a rest call or api ? what is your
field type ?

Regards
Harshvardhan Ojha


On Mon, Aug 11, 2014 at 4:04 PM, Shay Sofer sha...@checkpoint.com wrote:

 Hi,

 I have some strange cases while search with Solr.

 I have doc with names like: rule #22, rule +33, rule %44.

 When search for #22 or %55 or +33 Solr bring me as expected:  rule #22 and
 rule +33 and rule %44.

 But when appending star (*) to each search (#22*, +33*, %55*), just the
 one with + sign bring rule +33, all other result none.

 Can someone explain?

 Thanks,
 Shay.



RE: Solr search \ special cases

2014-08-11 Thread Shay Sofer
I call directly from Solr web api. Field type is string.

* should bring more results ? this is suffix search ? am I wrong ?

Thanks !

-Original Message-
From: Harshvardhan Ojha [mailto:ojha.harshvard...@gmail.com] 
Sent: Monday, August 11, 2014 1:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr search \ special cases

Hi Shay,

I believe + is treated as space, is it a rest call or api ? what is your field 
type ?

Regards
Harshvardhan Ojha


On Mon, Aug 11, 2014 at 4:04 PM, Shay Sofer sha...@checkpoint.com wrote:

 Hi,

 I have some strange cases while search with Solr.

 I have doc with names like: rule #22, rule +33, rule %44.

 When search for #22 or %55 or +33 Solr bring me as expected:  rule #22 
 and rule +33 and rule %44.

 But when appending star (*) to each search (#22*, +33*, %55*), just 
 the one with + sign bring rule +33, all other result none.

 Can someone explain?

 Thanks,
 Shay.



Email secured by Check Point


Re: Solr search \ special cases

2014-08-11 Thread Jack Krupansky
The use of a wildcard suppresses analysis of the query term, so the special 
characters remain, but... they were removed when the terms were indexed, so 
no match. You must manually emulate the index term analysis in order to use 
wildcards.


-- Jack Krupansky

-Original Message- 
From: Shay Sofer

Sent: Monday, August 11, 2014 6:34 AM
To: solr-user@lucene.apache.org
Subject: Solr search \ special cases

Hi,

I have some strange cases while search with Solr.

I have doc with names like: rule #22, rule +33, rule %44.

When search for #22 or %55 or +33 Solr bring me as expected:  rule #22 and 
rule +33 and rule %44.


But when appending star (*) to each search (#22*, +33*, %55*), just the one 
with + sign bring rule +33, all other result none.


Can someone explain?

Thanks,
Shay. 



RE: SqlEntityProcessor

2014-08-11 Thread Dyer, James
I've heard of a user adding a separate entity / section to the end of their 
data-config.xml with a SqlEntityProcessor and an UPDATE statement.  It would 
run after your main entity / section.  I have not tried it myself, and surely 
DIH was not designed to do this, but it might work.

A better solution might be to write a class implementing EventListener that 
does the db update you want and put an onImportEnd listener in your 
configuration.  See 
http://wiki.apache.org/solr/DataImportHandler#EventListeners for details.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Christof Lorenz [mailto:loc...@web.de] 
Sent: Sunday, August 10, 2014 6:52 AM
To: solr-user@lucene.apache.org
Subject: SqlEntityProcessor

Hi folks,

i am searching for a way to update a certain column in the rdbms for
each
item as soon as the item was indexed by solr. 
The column will be the indicator in the delta-query to select un-indexed
items.
We don't want to use the timestamp based mechanism that is default.

Any ideas how we could implement this ?

Regards,
Lochri




Re: SolrCloud Scale Struggle

2014-08-11 Thread Shawn Heisey
On 8/10/2014 11:07 PM, anand.mahajan wrote:
 Thank you for your suggestions. With the autoCommit (every 10 mins) and
 softCommit (every 10 secs) frequencies reduced things work much better now.
 The CPU usages has gone down considerably too (by about 60%) and the
 read/write throughput is showing considerable improvements too. 
 
 There are a certain shards that are giving poor response times - these have
 over 10M listings - I guess this is due to the fact that these are starving
 for RAM? Would it help if I split these up in smaller shards, but with the
 existing set of hardware? (I cannot allocate more machines to the cloud as
 yet)

Memory requirements are actually likely to go *up* a little bit with
more shards on the same hardware, not down.  The ideal RAM setup is to
have enough RAM in the machine to equal or exceed the sum of your max
Solr heap and the size of all the index data on that machine.  This
allows the operating system to load the entire index into the disk
cache.  Disks are *slow*, if the OS can pull the data it needs out of
RAM (the operating system disk cache), that becomes *very* fast.

If you have enough RAM to load at least two thirds of the index size,
performance is likely to also be very good, but 100% of the index is better.

If at all possible, put more RAM in the machine.

Thanks,
Shawn



When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-11 Thread Harun Reşit Zafer

Hi,

In the following configuration when uncomment both mm and 
maxCollationTries lines, and run a query on |/select|, Solr gets stuck 
with no exception.


I tried different values for both parameters and found that values for 
mm less than %40 still works.



|requestHandler name=/select class=solr.SearchHandler
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=defTypeedismax/str
   int name=timeAllowed1000/int
   str name=qftitle^3 title_s^2 content/str
   str name=pftitle content/str
   str name=flid,title,content,score/str
   float name=tie0.1/float
   str name=lowercaseOperatorstrue/str
   str name=stopwordstrue/str
   !-- str name=mm75%/str--
   int name=rows10/int

   str name=spellcheckon/str
   str name=spellcheck.dictionarydefault/str
   str name=spellcheck.dictionarywordbreak/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.extendedResultsfalse/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str
   str name=spellcheck.maxCollationTries5/str
   !-- str name=spellcheck.collateParam.mm100%/str--

   str name=spellcheck.maxCollations3/str
 /lst

 arr name=last-components
   strspellcheck/str
 /arr

/requestHandler

Any idea? Thanks
|


--
Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com



RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-11 Thread Dyer, James
Harun,

Just to clarify, is this happening during startup when a warmup query is 
running, or is this once the server is fully started?  This might be another 
instance of https://issues.apache.org/jira/browse/SOLR-5386 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] 
Sent: Monday, August 11, 2014 8:39 AM
To: solr-user@lucene.apache.org
Subject: When I use minimum match and maxCollationTries parameters together in 
edismax, Solr gets stuck

Hi,

In the following configuration when uncomment both mm and 
maxCollationTries lines, and run a query on |/select|, Solr gets stuck 
with no exception.

I tried different values for both parameters and found that values for 
mm less than %40 still works.


|requestHandler name=/select class=solr.SearchHandler
 !-- default values for query parameters can be specified, these
  will be overridden by parameters in the request
   --
  lst name=defaults
str name=echoParamsexplicit/str
str name=defTypeedismax/str
int name=timeAllowed1000/int
str name=qftitle^3 title_s^2 content/str
str name=pftitle content/str
str name=flid,title,content,score/str
float name=tie0.1/float
str name=lowercaseOperatorstrue/str
str name=stopwordstrue/str
!-- str name=mm75%/str--
int name=rows10/int

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
!-- str name=spellcheck.collateParam.mm100%/str--

str name=spellcheck.maxCollations3/str
  /lst

  arr name=last-components
strspellcheck/str
  /arr

 /requestHandler

Any idea? Thanks
|


-- 
Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com


Performance comparison of uploading SolrInputDocument vs JSONRequestHandler

2014-08-11 Thread georgelavash
I have a large number of documents that I am trying to load into SOLR. 

I am about to begin bench marking this effort, but I thought I would ask here. 
I have the documents in JSONArrays already. 

I am most concerned with ingest rate on the server. So I don't mind performing 
extra work on the client to speed up the server... 

Assuming I am using ConcurrentUpdateSolrServer, will I get better ingest 
performance if I convert all my documents to SolrInputDocuments before sending, 
or if I use the JsonRequestHandler on the server and send the JSONArrays via a 
ContentStreamUpdateRequest? 

Thanks, 
George 


Re: Performance comparison of uploading SolrInputDocument vs JSONRequestHandler

2014-08-11 Thread Erick Erickson
I think you're worrying about the wrong problem ;)

Often, the difference between the JSON and SolrInputDocument
decoding on the server is dwarfed by the time it takes the
client to assemble the docs to send. Quick test: When you start
indexing, how hard is the Solr server working (measure crudely
by looking at CPU utilization). Very frequently you'll find the server
sitting around waiting for the client to send documents.

Very often you'll get _much_ greater throughput gains by racking
N clients together all sending to the Solr server than you will get
by worrying about whether JSON or SolrInputDocument (or even
XML docs) is more efficient on the server.

That said, SolrInputDocuments are somewhat faster I think.

FWIW
Erick




On Mon, Aug 11, 2014 at 7:34 AM, georgelav...@comcast.net wrote:

 I have a large number of documents that I am trying to load into SOLR.

 I am about to begin bench marking this effort, but I thought I would ask
 here. I have the documents in JSONArrays already.

 I am most concerned with ingest rate on the server. So I don't mind
 performing extra work on the client to speed up the server...

 Assuming I am using ConcurrentUpdateSolrServer, will I get better ingest
 performance if I convert all my documents to SolrInputDocuments before
 sending,
 or if I use the JsonRequestHandler on the server and send the JSONArrays
 via a ContentStreamUpdateRequest?

 Thanks,
 George



RES: SOLRJ Stop Streaming

2014-08-11 Thread Felipe Dantas de Souza Paiva
Hey Guys,

any ideas?

Thanks,
Felipe

De: Felipe Paiva
Enviado: quarta-feira, 6 de agosto de 2014 16:40
Para: solr-user@lucene.apache.org
Assunto: SOLRJ Stop Streaming

Hi Guys,

in version 4.0 of SOLRJ a support for streaming response was added:

https://issues.apache.org/jira/browse/SOLR-2112

In my application, the output for the SOLR input stream is a response stream 
from a REST web service.

It works fine, but if the client closes the connection with the REST server, 
the SOLR stream continues to work. As a result of that, CPU remains being used, 
although nothing is being delivered to the client.

Is there a way to force the SOLR stream to be closed?

I think I would have to modify the class StreamingBinaryResponseParser, by 
adding a new method that checks if the SOLR stream should be closed.

Am I right? I am using the 4.1.0 version of the SOLRJ.

Thank you all.
Cheers,

Felipe Paiva
UOL - Analista de Sistemas
Av. Brig. Faria Lima, 1384, 3° andar . 01452-002 . São Paulo/SP
Telefone: 11 3092 6938



AVISO: A informação contida neste e-mail, bem como em qualquer de seus anexos, 
é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinatário(s) acima 
referido(s), podendo conter informações sigilosas e/ou legalmente protegidas. 
Caso você não seja o destinatário desta mensagem, informamos que qualquer 
divulgação, distribuição ou cópia deste e-mail e/ou de qualquer de seus anexos 
é absolutamente proibida. Solicitamos que o remetente seja comunicado 
imediatamente, respondendo esta mensagem, e que o original desta mensagem e de 
seus anexos, bem como toda e qualquer cópia e/ou impressão realizada a partir 
destes, sejam permanentemente apagados e/ou destruídos. Informações adicionais 
sobre nossa empresa podem ser obtidas no site http://sobre.uol.com.br/.

NOTICE: The information contained in this e-mail and any attachments thereto is 
CONFIDENTIAL and is intended only for use by the recipient named herein and may 
contain legally privileged and/or secret information.
If you are not the e-mail´s intended recipient, you are hereby notified that 
any dissemination, distribution or copy of this e-mail, and/or any attachments 
thereto, is strictly prohibited. Please immediately notify the sender replying 
to the above mentioned e-mail address, and permanently delete and/or destroy 
the original and any copy of this e-mail and/or its attachments, as well as any 
printout thereof. Additional information about our company may be obtained 
through the site http://www.uol.com.br/ir/.


Need some help with solr not restarting

2014-08-11 Thread Mike Thomsen
I'm very new to SolrCloud. When I tried restarting our tomcat server
running SolrCloud, I started getting this in our logs:

SEVERE:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/configs/configuration1/default-collection/data/index/_3ts3_Lucene41_0.doc
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:407)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:404)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:314)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1325)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327)
at 
org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:1099)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:199)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:74)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:206)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:281)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:262)
at 
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:107)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4797)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5473)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:634)
at 
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1074)
at 
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1858)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Aug 11, 2014 2:21:08 PM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start Solr. Check solr/home property and the logs
Aug 11, 2014 2:21:08 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.cloud.ZooKeeperException:
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:224)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:74)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:206)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:281)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:262)
at 
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:107)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4797)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5473)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:634)
at 
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1074)
at 
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1858)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 

regexTransformer returns no results if there is no match

2014-08-11 Thread alxsss
Hello,

I try to construct wikipedia page url from page title using regexTransformer
with


field column=title_underscore  regex=\s+ replaceWith=_  
sourceColName=title /

This does not work  for titles that have no space, so title_underscore for them 
is empty.

Any ideas what is wrong here?

This is with solr-4.8.1

Thanks. Alex.


SolrCloud OOM Problem

2014-08-11 Thread dancoleman
My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are
some specs on my setup: 

hosts: all are EC2 m1.large with 250G data volumes
documents: 120M total
zookeeper: 5 external t1.micros

startup command with memory and GC values
===
root 12499 1 61 19:36 pts/001:49:18 /usr/lib/jvm/jre/bin/java
-XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC
-Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory
-DnumShards=3 -Dbootstrap_confdir=/data/solr/lighting_products/conf
-Dcollection.configName=lighting_products_cloud_conf 
-DzkHost=ec2-00-17-55-217.compute-1.amazonaws.com:2181,ec2-00-82-150-252.compute-1.amazonaws.com:2181,ec2-00-234-237-109.compute-1.amazonaws.com:2181,ec2-00-224-205-204.compute-1.amazonaws.com:2181,ec2-00-20-72-124.compute-1.amazonaws.com:2181
-classpath
:/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
-Dcatalina.base=/usr/share/tomcat6 -Dcatalina.home=/usr/share/tomcat6
-Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat6/temp
-Djava.util.logging.config.file=/usr/share/tomcat6/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
org.apache.catalina.startup.Bootstrap start


Linux top command output with no indexing
===
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 8654 root  20   0 95.3g 6.4g 1.1g S 27.6 87.4  83:46.19 java


Linux top command output with indexing
===
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
12499 root  20   0 95.8g 5.8g 556m S 164.3 80.2 110:40.99 java


So it appears our indexing is clobbering the CPU but not the memory.

The user queries are pretty bad and I will provide a few examples here. 
Note that the sort date updated_dt is not the order in which documents are
indexed, the user application should be sorting on a 
different date.
===
INFO: [lighting_products] webapp=/solr path=/select
params={facet=truesort=updated_dt+descf.content_videotype_s.facet.missing=truespellcheck.q=candid+cameranocache=1407790600582distrib=falseversion=2oe=UTF-8fl=id,scoredf=textshard.url=10.211.82.113:80/solr/lighting_products/|10.249.34.65:80/solr/lighting_products/NOW=1407790600883ie=UTF-8facet.field=content_videotype_sfq=my_database_s:trainingfq=my_server_s:mydomain\-arc\-v2.lightingservices.comfq=allnamespaces_s_mv:(happyland\-site+OR+tags+OR+predicates+OR+authorities+OR+happyland+OR+movies+OR+global+OR+devicegroup+OR+people+OR+entertainment)fq={!tag%3Dct0}_contenttype_s:Standard\:ShowVideofsv=truesite=solr_arc_restype=SOLRwt=javabindefType=dismaxrows=50start=0f.content_videotype_s.facet.limit=160q=candid+cameraq.op=ANDisShard=true}
hits=22 status=0 QTime=118

INFO: [lighting_products] webapp=/solr path=/select

Re: SolrCloud OOM Problem

2014-08-11 Thread Shawn Heisey
On 8/11/2014 5:27 PM, dancoleman wrote:
 My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are
 some specs on my setup: 

 hosts: all are EC2 m1.large with 250G data volumes
 documents: 120M total
 zookeeper: 5 external t1.micros

snip

 Linux top command output with no indexing
 ===
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  8654 root  20   0 95.3g 6.4g 1.1g S 27.6 87.4  83:46.19 java


 Linux top command output with indexing
 ===
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 12499 root  20   0 95.8g 5.8g 556m S 164.3 80.2 110:40.99 java

I think you're likely going to need a much larger heap than 5GB, or
you're going to need a lot more machines and shards, so that each
machine has a much smaller piece of the index.  The java heap is only
one part of the story here, though.

Solr performance is terrible when the OS cannot effectively cache the
index, because Solr must actually read the disk to get the data required
for a query.  Disks are incredibly SLOW.  Even SSD storage is a *lot*
slower than RAM.

Your setup does not have anywhere near enough memory for the size of
your shards.  Amazon's website says that the m1.large instance has 7.5GB
of RAM.  You're allocating 5GB of that to Solr (the java heap) according
to your startup options.  If you subtract a little more for the
operating system and basic system services, that leaves about 2GB of RAM
for the disk cache.  Based on the numbers from top, that Solr instance
is handling nearly 90GB of index.  2GB of RAM for caching is nowhere
near enough -- you will want between 32GB and 96GB of total RAM for that
much index.

http://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn



Re: SolrCloud OOM Problem

2014-08-11 Thread dancoleman
90G is correct, each host is currently holding that much data.

Are you saying that 32GB to 96GB would be needed for each host?   Assuming
we did not add more shards that is.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152401.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud OOM Problem

2014-08-11 Thread Shawn Heisey
 90G is correct, each host is currently holding that much data.

 Are you saying that 32GB to 96GB would be needed for each host?   Assuming
 we did not add more shards that is.

If you want good performance and enough memory to give Solr the heap it
will need, yes. Lucene (the search API that Solr uses) relies on good
operating system caching for the index. Having enough memory to catch the
ENTIRE index is not usually required, but it is recommended.

Alternatively, you can add a lot more hosts and create a new collection
with a lot more shards. The total memory requirement across the whole
cloud won't go down, but each host won't require as much.

Thanks,
Shawn




what's the difference between solr and elasticsearch in hdfs case?

2014-08-11 Thread Jianyi
Hi~

I'm new to both solr and elasticsearch. I have read that both the two
support creating index on hdfs.

So, what's the difference between solr and elasticsearch in hdfs case?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413.html
Sent from the Solr - User mailing list archive at Nabble.com.


SpatialForTimeDurations question

2014-08-11 Thread 小川修
Hello.
I am sorry for bad English.

I am using Solr4.7.1
I want to search date range query to multiValued field.

Then, I found solution.
http://wiki.apache.org/solr/SpatialForTimeDurations

The solution is almost perfect.

But some values, I got error message

#message
2014/08/10 23:28:12.558: ERROR: Fatal: 400: ERROR: [doc=12345] Error
adding field 'disp_date_range'='[20140418 201408201908]'
msg=Index: 0, Size: 0

#schema.xml

fieldType name=date_range class=solr.SpatialRecursivePrefixTreeFieldType
multiValued=true
geo=false
worldBounds=0 0 1231 1231
distErrPct=0
maxDistErr=1
units=degrees /

#data
doc
field name=id12345/field
field name=disp_date_range update=set20140418 201408201908/field
/doc

I checked other values.
And there are some values that solr can not store.
For example,
202805041049 202805041049
201301041353 201301041353
200305281316 200305281316
200601011536 200601011536
203005271640 203005271640
201505211646 201505211646
202602071904 202602071904

Can anyone help please ?!

Thanks.
Shu Ogawa


Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-11 Thread Alexandre Rafalovitch
Are you comparing Solr vs. ElasticSearch. Or Cloudera vs.
ElasticSearch? Because Cloudera is also commercial like ElasticSearch
and has a full HDFS story.

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Tue, Aug 12, 2014 at 4:43 AM, Jianyi phoenix.w.2...@qq.com wrote:
 Hi~

 I'm new to both solr and elasticsearch. I have read that both the two
 support creating index on hdfs.

 So, what's the difference between solr and elasticsearch in hdfs case?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413.html
 Sent from the Solr - User mailing list archive at Nabble.com.