Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
I need to store in SOLR all data of my clients mailing activitiy

The data contains meta data like From;To:Date;Time:Subject etc

I would easily have 1000 Million records every 2 months.

What I am currently doing is creating cores per client. So I have 400 cores
already.

Is this a good idea to do ?

What is the general practice for creating cores


Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Harun Reşit Zafer
I happens once the server is fully started. And when it gets stuck 
sometimes I have to restart the server, sometimes I'm able to edit the 
solrconfig.xml and reload it.


Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com

On 11.08.2014 17:32, Dyer, James wrote:

Harun,

Just to clarify, is this happening during startup when a warmup query is 
running, or is this once the server is fully started?  This might be another 
instance of https://issues.apache.org/jira/browse/SOLR-5386 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr]
Sent: Monday, August 11, 2014 8:39 AM
To: solr-user@lucene.apache.org
Subject: When I use minimum match and maxCollationTries parameters together in 
edismax, Solr gets stuck

Hi,

In the following configuration when uncomment both mm and
maxCollationTries lines, and run a query on |/select|, Solr gets stuck
with no exception.

I tried different values for both parameters and found that values for
mm less than %40 still works.


|requestHandler name=/select class=solr.SearchHandler
  !-- default values for query parameters can be specified, these
   will be overridden by parameters in the request
--
   lst name=defaults
 str name=echoParamsexplicit/str
 str name=defTypeedismax/str
 int name=timeAllowed1000/int
 str name=qftitle^3 title_s^2 content/str
 str name=pftitle content/str
 str name=flid,title,content,score/str
 float name=tie0.1/float
 str name=lowercaseOperatorstrue/str
 str name=stopwordstrue/str
 !-- str name=mm75%/str--
 int name=rows10/int

 str name=spellcheckon/str
 str name=spellcheck.dictionarydefault/str
 str name=spellcheck.dictionarywordbreak/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.count5/str
 str name=spellcheck.maxResultsForSuggest5/str
 str name=spellcheck.extendedResultsfalse/str
 str name=spellcheck.alternativeTermCount2/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.collateExtendedResultstrue/str
 str name=spellcheck.maxCollationTries5/str
 !-- str name=spellcheck.collateParam.mm100%/str--

 str name=spellcheck.maxCollations3/str
   /lst

   arr name=last-components
 strspellcheck/str
   /arr

  /requestHandler

Any idea? Thanks
|






Re: Can I use multiple cores

2014-08-12 Thread Anshum Gupta
Hi Ramprasad,

You can certainly have a system with hundreds of cores. I know of more than
a few people who have done that successfully in their setups.

At the same time, I'd also recommend to you to have a look at SolrCloud.
SolrCloud takes away the operational pains like replication/recovery etc.
to a major extent. I don't know about your security requirements and hard
bounds on that front but look at routing in SolrCloud to also figure out
multi-tenancy implementation here:
* SolrCloud Document Routing by Joel:
http://searchhub.org/2013/06/13/solr-cloud-document-routing/
* Multi-level composite-id routing in SolrCloud:
http://searchhub.org/2014/01/06/10590/



On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 I need to store in SOLR all data of my clients mailing activitiy

 The data contains meta data like From;To:Date;Time:Subject etc

 I would easily have 1000 Million records every 2 months.

 What I am currently doing is creating cores per client. So I have 400 cores
 already.

 Is this a good idea to do ?

 What is the general practice for creating cores




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 08:40 +0200, Ramprasad Padmanabhan wrote:
 I need to store in SOLR all data of my clients mailing activitiy
 
 The data contains meta data like From;To:Date;Time:Subject etc
 
 I would easily have 1000 Million records every 2 months.

If standard searches are always inside a single client's emails and not
across all cores, this should scale simply by adding new machines linear
to the corpus size.

 What I am currently doing is creating cores per client. So I have 400 cores
 already.
 
 Is this a good idea to do ?

Yes. One core per client ensures than ranking works well. It makes it
easy to remove users and if part of the users are inactive for long
periods of time, you can use dynamic loading of cores.

That is under the presumption that you will have a few thousand clients.
If your expected scale is millions, I am not sure it will work.

- Toke Eskildsen, State and University Library, Denmark




Re: Can I use multiple cores

2014-08-12 Thread Harshvardhan Ojha
I think this question is more aimed at design and performance of large
number of cores.
Also solr is designed to handle multiple cores effectively, however it
would be interesting to know If you have observed any performance problem
with growing number of cores, with number of nodes and solr version.

Regards
Harshvardhan Ojha


On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 Hi Ramprasad,

 You can certainly have a system with hundreds of cores. I know of more than
 a few people who have done that successfully in their setups.

 At the same time, I'd also recommend to you to have a look at SolrCloud.
 SolrCloud takes away the operational pains like replication/recovery etc.
 to a major extent. I don't know about your security requirements and hard
 bounds on that front but look at routing in SolrCloud to also figure out
 multi-tenancy implementation here:
 * SolrCloud Document Routing by Joel:
 http://searchhub.org/2013/06/13/solr-cloud-document-routing/
 * Multi-level composite-id routing in SolrCloud:
 http://searchhub.org/2014/01/06/10590/



 On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan 
 ramprasad...@gmail.com wrote:

  I need to store in SOLR all data of my clients mailing activitiy
 
  The data contains meta data like From;To:Date;Time:Subject etc
 
  I would easily have 1000 Million records every 2 months.
 
  What I am currently doing is creating cores per client. So I have 400
 cores
  already.
 
  Is this a good idea to do ?
 
  What is the general practice for creating cores
 



 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-12 Thread Jianyi
Hi Alex,

Thanks for your reply.

I'm comparing Solr vs. ElasticSearch. 

Dose solr store index on hdfs in raw lucene format? I mean, if in that way,
we can get the index files from hdfs and directly put them into an
application based on lucene.

It seems that ElasticSearch dose not store the raw lucene index on hdfs
directly. It has its special data structure and operations.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152450.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Harun Reşit Zafer
I tried again to make sure. Server starts, I can see web admin gui but I 
can't navigate btw tabs. It just says loading. But on the terminal 
console everything seems normal.


Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com

On 12.08.2014 09:42, Harun Reşit Zafer wrote:
I happens once the server is fully started. And when it gets stuck 
sometimes I have to restart the server, sometimes I'm able to edit the 
solrconfig.xml and reload it.


Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com

On 11.08.2014 17:32, Dyer, James wrote:

Harun,

Just to clarify, is this happening during startup when a warmup query 
is running, or is this once the server is fully started? This might 
be another instance of https://issues.apache.org/jira/browse/SOLR-5386 .


James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr]
Sent: Monday, August 11, 2014 8:39 AM
To: solr-user@lucene.apache.org
Subject: When I use minimum match and maxCollationTries parameters 
together in edismax, Solr gets stuck


Hi,

In the following configuration when uncomment both mm and
maxCollationTries lines, and run a query on |/select|, Solr gets stuck
with no exception.

I tried different values for both parameters and found that values for
mm less than %40 still works.


|requestHandler name=/select class=solr.SearchHandler
  !-- default values for query parameters can be specified, these
   will be overridden by parameters in the request
--
   lst name=defaults
 str name=echoParamsexplicit/str
 str name=defTypeedismax/str
 int name=timeAllowed1000/int
 str name=qftitle^3 title_s^2 content/str
 str name=pftitle content/str
 str name=flid,title,content,score/str
 float name=tie0.1/float
 str name=lowercaseOperatorstrue/str
 str name=stopwordstrue/str
 !-- str name=mm75%/str--
 int name=rows10/int

 str name=spellcheckon/str
 str name=spellcheck.dictionarydefault/str
 str name=spellcheck.dictionarywordbreak/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.count5/str
 str name=spellcheck.maxResultsForSuggest5/str
 str name=spellcheck.extendedResultsfalse/str
 str name=spellcheck.alternativeTermCount2/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.collateExtendedResultstrue/str
 str name=spellcheck.maxCollationTries5/str
 !-- str name=spellcheck.collateParam.mm100%/str--

 str name=spellcheck.maxCollations3/str
   /lst

   arr name=last-components
 strspellcheck/str
   /arr

  /requestHandler

Any idea? Thanks
|









Re: SolrCloud OOM Problem

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 01:27 +0200, dancoleman wrote:
 My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are
 some specs on my setup: 
 
 hosts: all are EC2 m1.large with 250G data volumes

Is that 3 (each running a primary and a replica shard) or 6 instances?

 documents: 120M total
 zookeeper: 5 external t1.micros

If your facet fields has many unique values and if you have many
concurrent requests, then memory usage will be high. But by the looks of
it, I guess that the facets fields has relatively few values?

Still, if you have many concurrent queries, you might consider using a
queue in front of your SolrCloud instead of just starting new requests,
in order to set an effective limit on heap usage. 

- Toke Eskildsen, State and University Library, Denmark




Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
Are there documented benchmarks with number of cores
As of now I just have a test bed.


We have 150 million records ( will go up to 1000 M )  , distributed in 400
cores.
A single machine 16GB RAM + 16 cores  search  is working fine

But I still am not sure will this work fine in production
Obviously I can always add more nodes to solr, but I need to justify how
much I need.





On 12 August 2014 12:48, Harshvardhan Ojha ojha.harshvard...@gmail.com
wrote:

 I think this question is more aimed at design and performance of large
 number of cores.
 Also solr is designed to handle multiple cores effectively, however it
 would be interesting to know If you have observed any performance problem
 with growing number of cores, with number of nodes and solr version.

 Regards
 Harshvardhan Ojha


 On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net
 wrote:

  Hi Ramprasad,
 
  You can certainly have a system with hundreds of cores. I know of more
 than
  a few people who have done that successfully in their setups.
 
  At the same time, I'd also recommend to you to have a look at SolrCloud.
  SolrCloud takes away the operational pains like replication/recovery etc.
  to a major extent. I don't know about your security requirements and hard
  bounds on that front but look at routing in SolrCloud to also figure out
  multi-tenancy implementation here:
  * SolrCloud Document Routing by Joel:
  http://searchhub.org/2013/06/13/solr-cloud-document-routing/
  * Multi-level composite-id routing in SolrCloud:
  http://searchhub.org/2014/01/06/10590/
 
 
 
  On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan 
  ramprasad...@gmail.com wrote:
 
   I need to store in SOLR all data of my clients mailing activitiy
  
   The data contains meta data like From;To:Date;Time:Subject etc
  
   I would easily have 1000 Million records every 2 months.
  
   What I am currently doing is creating cores per client. So I have 400
  cores
   already.
  
   Is this a good idea to do ?
  
   What is the general practice for creating cores
  
 
 
 
  --
 
  Anshum Gupta
  http://www.anshumgupta.net
 



Re: Help Required

2014-08-12 Thread Dmitry Kan
Hi,

is http://wiki.apache.org/solr/Support page immutable?

Dmitry


On Fri, Aug 8, 2014 at 4:24 PM, Jack Krupansky j...@basetechnology.com
wrote:

 And the Solr Support list is where people register their available
 consulting services:
 http://wiki.apache.org/solr/Support

 -- Jack Krupansky

 -Original Message- From: Alexandre Rafalovitch
 Sent: Friday, August 8, 2014 9:12 AM
 To: solr-user
 Subject: Re: Help Required


 We don't mediate jobs offers/positions on this list. We help people to
 learn how to make these kinds of things yourself. If you are a
 developer, you may find that it would take only several days to get a
 strong feel for Solr. Especially, if you start from tutorials/right
 books.

 To find developers, using the normal job boards would probably be more
 efficient. That way you can list location, salary, timelines, etc.

 Regards,
   Alex.
 P.s. CityPantry does not actually seem to do what you are asking. They
 are starting from postcode, though possibly use the geodistance
 sorting afterwards.
 P.p.s. Yes, Solr can help with distance-based sorting.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On Fri, Aug 8, 2014 at 11:36 AM, INGRID MARSH
 ingridma...@btinternet.com wrote:

 Dear Sirs,

 I wonder if you can help me?

 I'm looking for a developer who uses Solr to build for me a facted seach
 facilty using location. In a nutshell, I need this funtionality as in here:

 www.citypantry.com
 wwwdinein.

 Here the vendor via google maps enters the area/radius they cover which
 enable the user to enter their postcode and be presented with the users who
 serve/cover their area. Is this what solr does?

 can you put me in touch with small developers who can help?

 Thanks so much.


 Ingrid Marsh





-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Modifying date format when using TrieDateField.

2014-08-12 Thread Modassar Ather
Hi,

I have a TrieDateField where I want to store a date in -MM-dd format
as my source contains the date in same format.
As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format
hence the date is getting formatted to the same.

Kindly let me know:
 How can I change the date format during indexing when using
TrieDateField?
 How I can stop the date modification due to time zone? E.g. My
1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using
TrieDateField.

Thanks,
Modassar


Re: Modifying date format when using TrieDateField.

2014-08-12 Thread Jack Krupansky

Use the parse date update request processor:

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html

Additional examples are in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: Modassar Ather

Sent: Tuesday, August 12, 2014 7:24 AM
To: solr-user@lucene.apache.org
Subject: Modifying date format when using TrieDateField.

Hi,

I have a TrieDateField where I want to store a date in -MM-dd format
as my source contains the date in same format.
As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format
hence the date is getting formatted to the same.

Kindly let me know:
How can I change the date format during indexing when using
TrieDateField?
How I can stop the date modification due to time zone? E.g. My
1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using
TrieDateField.

Thanks,
Modassar 



Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 11:50 +0200, Ramprasad Padmanabhan wrote:
 Are there documented benchmarks with number of cores
 As of now I just have a test bed.
 
 
 We have 150 million records ( will go up to 1000 M )  , distributed in 400
 cores.
 A single machine 16GB RAM + 16 cores  search  is working fine

About 6M records for a single machine. That is not a lot. What is a
typical query rate for a core?

I would guess that the CPU is idle most of the time and that you could
serve quite a lot more cores from a single machine by increasing RAM or
using SSDs (if you are not doing so already). How large is a typical
core in GB?

 But I still am not sure will this work fine in production

16 cores is not many for a single machine and since you can direct any
search to a single core, you can scale up forever. What is it you are
worried about?

 Obviously I can always add more nodes to solr, but I need to justify how
 much I need.

Are you worried about cost?

- Toke Eskildsen, State and University Library, Denmark




Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
Sorry for missing information. My solr-cores take less than 200MB of disk

What I am worried about is If I run too many cores from a single solr
machine there will be a limit to the number of concurrent searches it can
support. I am still benchmarking for this.


Also another major bottleneck I find is adding data to solr.
I have a cron job that picks data from Mysql Live DB and adds to solr. If I
run each core addition serially it works , but If try a multiprocessed
system then this addition simply hangs. Even if all processes are talking
to different cores.

 This means beyond some point my insertion will take too long and I will
have to have multiple servers. Too bad because actually there is no problem
with data search , only with data add


Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 14:14 +0200, Ramprasad Padmanabhan wrote:
 Sorry for missing information. My solr-cores take less than 200MB of
 disk 

So ~3GB/server. If you do not have special heavy queries, high query
rate or heavy requirements for index availability, that really sounds
like you could put a lot more cores on each machine.

 What I am worried about is If I run too many cores from a single solr
 machine there will be a limit to the number of concurrent searches it
 can support. I am still benchmarking for this. 

By all means, benchmark! Try to pinpoint what limits the amount of
concurrent searches: CPU or IO?

 I have a cron job that picks data from Mysql Live DB and adds to solr.
 If I run each core addition serially it works , but If try a
 multiprocessed system then this addition simply hangs. Even if all
 processes are talking to different cores. 

Are you sure the problem is in the Solr end? Have you tried running the
multithreaded extraction without adding the data to Solr?

- Toke Eskildsen, State and University Library, Denmark




Re: Modifying date format when using TrieDateField.

2014-08-12 Thread Modassar Ather
Hi Jack,

Thanks for your suggestion. I think the way I am using the
ParseDateFieldUpdateProcessorFactory is not right hence the date is not
getting transformed to the desired format.
I added following in solrconfig.xml and see no effect in search result. The
date is still in -MM-dd'T'HH:mm:ss format.

 processor class=solr.ParseDateFieldUpdateProcessorFactory
   arr name=format
 str-MM-dd/str
   /arr
 /processor

I have following field defined in schema.xml. Kindly provide an example to
configure it under solrconfig.xml to get the date changed to desired
format.

fieldType name=tdate class=solr.TrieDateField precisionStep=0
positionIncrementGap=0/

Also please let me know if I am missing anything in the configuration.

Thanks,
Modassar


On Tue, Aug 12, 2014 at 5:05 PM, Jack Krupansky j...@basetechnology.com
wrote:

 Use the parse date update request processor:

 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/
 solr/update/processor/ParseDateFieldUpdateProcessorFactory.html

 Additional examples are in my e-book:
 http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-
 deep-dive-early-access-release-7/ebook/product-21203548.html

 -- Jack Krupansky

 -Original Message- From: Modassar Ather
 Sent: Tuesday, August 12, 2014 7:24 AM
 To: solr-user@lucene.apache.org
 Subject: Modifying date format when using TrieDateField.


 Hi,

 I have a TrieDateField where I want to store a date in -MM-dd format
 as my source contains the date in same format.
 As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format
 hence the date is getting formatted to the same.

 Kindly let me know:
 How can I change the date format during indexing when using
 TrieDateField?
 How I can stop the date modification due to time zone? E.g. My
 1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using
 TrieDateField.

 Thanks,
 Modassar



Re: Can I use multiple cores

2014-08-12 Thread Noble Paul
Hi Ramprasad,


I have used it in a cluster with millions of users (1 user per core) in
legacy cloud mode .We used the on demand core loading feature where each
Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
just hitting 400 and I don't see much of a problem . What is your h/w bTW?


On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 I need to store in SOLR all data of my clients mailing activitiy

 The data contains meta data like From;To:Date;Time:Subject etc

 I would easily have 1000 Million records every 2 months.

 What I am currently doing is creating cores per client. So I have 400 cores
 already.

 Is this a good idea to do ?

 What is the general practice for creating cores




-- 
-
Noble Paul


Re: Can I use multiple cores

2014-08-12 Thread Aurélien MAZOYER

Hi Paul and Ramprasad,

I follow your discussion with interest as I will have more or less the 
same requirement.
When you say that you use on demand core loading, are you talking about 
LotsOfCore stuff?
Erick told me that it does not work very well in a distributed 
environnement.
How do you handle this problem? Do you use multiple single Solr 
instances? What about failover?


Thanks for your answer,

Aurelien

Le 12/08/2014 14:48, Noble Paul a écrit :

Hi Ramprasad,


I have used it in a cluster with millions of users (1 user per core) in
legacy cloud mode .We used the on demand core loading feature where each
Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
just hitting 400 and I don't see much of a problem . What is your h/w bTW?


On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:


I need to store in SOLR all data of my clients mailing activitiy

The data contains meta data like From;To:Date;Time:Subject etc

I would easily have 1000 Million records every 2 months.

What I am currently doing is creating cores per client. So I have 400 cores
already.

Is this a good idea to do ?

What is the general practice for creating cores








RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Dyer, James
Harun,

What do you mean by the terminal console?  Do you mean to say the admin gui 
freezes but you can still issue queries to solr directly through your browser?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] 
Sent: Tuesday, August 12, 2014 2:46 AM
To: solr-user@lucene.apache.org
Subject: Re: When I use minimum match and maxCollationTries parameters together 
in edismax, Solr gets stuck

I tried again to make sure. Server starts, I can see web admin gui but I 
can't navigate btw tabs. It just says loading. But on the terminal 
console everything seems normal.

Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com

On 12.08.2014 09:42, Harun Reşit Zafer wrote:
 I happens once the server is fully started. And when it gets stuck 
 sometimes I have to restart the server, sometimes I'm able to edit the 
 solrconfig.xml and reload it.

 Harun Reşit Zafer
 TÜBİTAK BİLGEM BTE
 Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
 T +90 262 675 3268
 W  http://www.hrzafer.com

 On 11.08.2014 17:32, Dyer, James wrote:
 Harun,

 Just to clarify, is this happening during startup when a warmup query 
 is running, or is this once the server is fully started? This might 
 be another instance of https://issues.apache.org/jira/browse/SOLR-5386 .

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr]
 Sent: Monday, August 11, 2014 8:39 AM
 To: solr-user@lucene.apache.org
 Subject: When I use minimum match and maxCollationTries parameters 
 together in edismax, Solr gets stuck

 Hi,

 In the following configuration when uncomment both mm and
 maxCollationTries lines, and run a query on |/select|, Solr gets stuck
 with no exception.

 I tried different values for both parameters and found that values for
 mm less than %40 still works.


 |requestHandler name=/select class=solr.SearchHandler
   !-- default values for query parameters can be specified, these
will be overridden by parameters in the request
 --
lst name=defaults
  str name=echoParamsexplicit/str
  str name=defTypeedismax/str
  int name=timeAllowed1000/int
  str name=qftitle^3 title_s^2 content/str
  str name=pftitle content/str
  str name=flid,title,content,score/str
  float name=tie0.1/float
  str name=lowercaseOperatorstrue/str
  str name=stopwordstrue/str
  !-- str name=mm75%/str--
  int name=rows10/int

  str name=spellcheckon/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheck.dictionarywordbreak/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.maxResultsForSuggest5/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.alternativeTermCount2/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.collateExtendedResultstrue/str
  str name=spellcheck.maxCollationTries5/str
  !-- str name=spellcheck.collateParam.mm100%/str--

  str name=spellcheck.maxCollations3/str
/lst

arr name=last-components
  strspellcheck/str
/arr

   /requestHandler

 Any idea? Thanks
 |








Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote:

 Hi Ramprasad,


 I have used it in a cluster with millions of users (1 user per core) in
 legacy cloud mode .We used the on demand core loading feature where each
 Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
 just hitting 400 and I don't see much of a problem . What is your h/w bTW?


 On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
 ramprasad...@gmail.com wrote:

  I need to store in SOLR all data of my clients mailing activitiy
 
  The data contains meta data like From;To:Date;Time:Subject etc
 
  I would easily have 1000 Million records every 2 months.
 
  What I am currently doing is creating cores per client. So I have 400
 cores
  already.
 
  Is this a good idea to do ?
 
  What is the general practice for creating cores
 


I have a single machine 16GB Ram with 16 cpu cores

What is the h/w you are using


Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread cwhit
I've been trying to debug through this but I'm stumped.  I have a Solr index
with ~40 million documents indexed currently sitting idle.  I update an
existing document through the web interface (collection1 - Documents -
/update) and the web request returns successfully.  At this point, I expect
the document to be updated on future searches within 1 second, but that's
not the case.  The document can sometimes not be updated to future searches
for several minutes.  What could be causing this, and how can it be
remedied?

Within my solrconfig.xml, I have the following commit properties set:

autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit 

autoCommit 
maxTime30/maxTime 
 openSearcherfalse/openSearcher 
/autoCommit

Running an identical Solr configuration but with thousands of documents
(rather than tens of millions), the updates are available immediately.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updates-to-index-not-available-immediately-as-index-scales-even-with-autoSoftCommit-at-1-second-tp4152511.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help Required

2014-08-12 Thread Shawn Heisey
On 8/12/2014 3:57 AM, Dmitry Kan wrote:
 Hi,

 is http://wiki.apache.org/solr/Support page immutable?

All pages on that wiki are changeable by end users.  You just need to
create an account on the wiki and then ask on this list to have your
wiki username added to the Contributor group.

Thanks,
Shawn



RE: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
Ramprasad Padmanabhan [ramprasad...@gmail.com] wrote:
 I have a single machine 16GB Ram with 16 cpu cores

Ah! I thought you had more machines, each with 16 Solr cores.

This changes a lot. 400 Solr cores of ~200MB ~= 80GB of data. You're aiming for 
7 times that, so about 500GB of data. Running that on a single machine with 
16GB of RAM is not unrealistic, but it depends a lot on how often a search is 
issued and whether or not you can unload inactive cores and accept the startup 
penalty of loading it the first time a user searches for something. Searches 
will be really slow if you are using a spinning drive.

You might be interested in 
http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

As for indexing then I can understand if you run into problems with 400 
concurrent updates to your single machine setup. You should limit the amount of 
concurrent updates to a bit more than the number of cores, so try with 20 or 40.

- Toke Eskildsen


Re: Modifying date format when using TrieDateField.

2014-08-12 Thread Erick Erickson
The response will always be the full specification,
so you'll have -MM-dd'T'HH:mm:ss format.
If you want the user to just see the -MM-dd
you could use a DocTransformer to change it on
the way out.

You cannot change the way the dates are stored
internally. The DateTransformer is just there to
allow different inputs, it has no effect on the stored
data at all.

Best,
Erick


On Tue, Aug 12, 2014 at 5:33 AM, Modassar Ather modather1...@gmail.com
wrote:

 Hi Jack,

 Thanks for your suggestion. I think the way I am using the
 ParseDateFieldUpdateProcessorFactory is not right hence the date is not
 getting transformed to the desired format.
 I added following in solrconfig.xml and see no effect in search result. The
 date is still in -MM-dd'T'HH:mm:ss format.

  processor class=solr.ParseDateFieldUpdateProcessorFactory
arr name=format
  str-MM-dd/str
/arr
  /processor

 I have following field defined in schema.xml. Kindly provide an example to
 configure it under solrconfig.xml to get the date changed to desired
 format.

 fieldType name=tdate class=solr.TrieDateField precisionStep=0
 positionIncrementGap=0/

 Also please let me know if I am missing anything in the configuration.

 Thanks,
 Modassar


 On Tue, Aug 12, 2014 at 5:05 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  Use the parse date update request processor:
 
  http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/
  solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
 
  Additional examples are in my e-book:
  http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-
  deep-dive-early-access-release-7/ebook/product-21203548.html
 
  -- Jack Krupansky
 
  -Original Message- From: Modassar Ather
  Sent: Tuesday, August 12, 2014 7:24 AM
  To: solr-user@lucene.apache.org
  Subject: Modifying date format when using TrieDateField.
 
 
  Hi,
 
  I have a TrieDateField where I want to store a date in -MM-dd
 format
  as my source contains the date in same format.
  As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss
 format
  hence the date is getting formatted to the same.
 
  Kindly let me know:
  How can I change the date format during indexing when using
  TrieDateField?
  How I can stop the date modification due to time zone? E.g. My
  1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using
  TrieDateField.
 
  Thanks,
  Modassar
 



Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread Chris Hostetter

You havne't given us a lot of information to go on (ie: full 
solrconfig.xml, log messages arround the tim of your update, etc...) but 
my best guess would be that you are seeing a delay between the time the 
new searcher is opened and the time the newSearcher is made available to 
requests due to cache warming.

the specifics of your cache configs and newSearcher event listeners 
would impact this ... and of course, you'd see log messages about opening 
hte searcher, the cache warming, etc



: Date: Tue, 12 Aug 2014 07:18:20 -0700 (PDT)
: From: cwhit cwhi...@solinkcorp.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Updates to index not available immediately as index scales,
: even with autoSoftCommit at 1 second
: 
: I've been trying to debug through this but I'm stumped.  I have a Solr index
: with ~40 million documents indexed currently sitting idle.  I update an
: existing document through the web interface (collection1 - Documents -
: /update) and the web request returns successfully.  At this point, I expect
: the document to be updated on future searches within 1 second, but that's
: not the case.  The document can sometimes not be updated to future searches
: for several minutes.  What could be causing this, and how can it be
: remedied?
: 
: Within my solrconfig.xml, I have the following commit properties set:
: 
: autoSoftCommit
: maxTime1000/maxTime
: /autoSoftCommit 
: 
: autoCommit 
: maxTime30/maxTime 
:  openSearcherfalse/openSearcher 
: /autoCommit
: 
: Running an identical Solr configuration but with thousands of documents
: (rather than tens of millions), the updates are available immediately.
: 
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Updates-to-index-not-available-immediately-as-index-scales-even-with-autoSoftCommit-at-1-second-tp4152511.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/


Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread cwhit
I'm not seeing any messages in the log with respect to cache warming at the
time, but I will investigate that possibility.  Thank you.  In case it is
helpful, I pasted the entire solrconfig.xml at http://pastebin.com/C0iQ7E9a



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updates-to-index-not-available-immediately-as-index-scales-even-with-autoSoftCommit-at-1-second-tp4152511p4152545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread Chris Hostetter

: I'm not seeing any messages in the log with respect to cache warming at the
: time, but I will investigate that possibility.  Thank you.  In case it is

what logs *do* you see at the time you send the doc? 

w/o details, we can't help you.

: helpful, I pasted the entire solrconfig.xml at http://pastebin.com/C0iQ7E9a


-Hoss
http://www.lucidworks.com/


Re: Can I use multiple cores

2014-08-12 Thread Noble Paul
The machines were 32GB ram boxes. You must do the RAM requirement
calculation for your indexes . Just the no:of indexes alone won't be enough
to arrive at the RAM requirement


On Tue, Aug 12, 2014 at 6:59 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote:

  Hi Ramprasad,
 
 
  I have used it in a cluster with millions of users (1 user per core) in
  legacy cloud mode .We used the on demand core loading feature where each
  Solr had 30,000 cores and at a time only 2000 cores were in memory. You
 are
  just hitting 400 and I don't see much of a problem . What is your h/w
 bTW?
 
 
  On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
  ramprasad...@gmail.com wrote:
 
   I need to store in SOLR all data of my clients mailing activitiy
  
   The data contains meta data like From;To:Date;Time:Subject etc
  
   I would easily have 1000 Million records every 2 months.
  
   What I am currently doing is creating cores per client. So I have 400
  cores
   already.
  
   Is this a good idea to do ?
  
   What is the general practice for creating cores
  
 
 
 I have a single machine 16GB Ram with 16 cpu cores

 What is the h/w you are using




-- 
-
Noble Paul


Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread cwhit
Immediately after triggering the update, this is what is in the logs:

/2014-08-12 12:58:48,774 | [71] | 153414367 [qtp2038499066-4772] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [collection1]
webapp=/solr path=/update params={wt=json} {add=[52627624
(1476251068652322816)]} 0 34

2014-08-12 12:58:49,773 | [71] | 153415369 [commitScheduler-7-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

2014-08-12 12:58:49,862 | [71] | 153415459 [commitScheduler-7-thread-1] INFO 
org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@65c48c06 main

2014-08-12 12:58:49,874 | [71] | 153415472 [commitScheduler-7-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – end_commit_flush/

The end_commit_flush leads me to believe that the soft commit has completed,
but perhaps that thought is wrong.  There are no other logs for a while,
until 

/
2014-08-12 13:03:49,556 | [71] | 153715147 [commitScheduler-6-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

2014-08-12 13:03:49,805 | [71] | 153715402 [commitScheduler-6-thread-1] INFO 
org.apache.solr.core.SolrCore  – SolrDeletionPolicy.onCommit: commits: num=2

2014-08-12 13:03:49,805 | [71] |
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program
Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2we,generation=3758}

2014-08-12 13:03:49,805 | [71] |
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program
Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2wf,generation=3759}

2014-08-12 13:03:49,805 | [34] | 153715403 [commitScheduler-6-thread-1] INFO 
org.apache.solr.core.SolrCore  – newest commit generation = 3759

2014-08-12 13:03:49,818 | [34] | 153715415 [commitScheduler-6-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – end_commit_flush
/
At this point, the update is still not present...


/2014-08-12 13:11:45,279 | [81] | 154190876 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.core.SolrCore  – QuerySenderListener sending requests
to Searcher@65c48c06 main{StandardDirectoryReader(segments_2we:82217:nrt
_qkc(4.6):C8161558/879724:delGen=275 _sra(4.6):C2943436/247953:delGen=51
_r2w(4.6):C1149753/18376:delGen=55 _rgs(4.6):C1468449/648612:delGen=107
_tdl(4.6):C583431/7873:delGen=94 _svo(4.6):C197286/7:delGen=5
_t4d(4.6):C247031/2928:delGen=36 _tkf(4.6):C111429/761:delGen=23
_tch(4.6):C6014/81:delGen=22 _tk5(4.6):C3907/242:delGen=21
_tjv(4.6):C3492/119:delGen=13 _thd(4.6):C5014/241:delGen=24
_tdh(4.6):C5375/437:delGen=30 _tj1(4.6):C5989/15:delGen=6
_tkq(4.6):C1749/36:delGen=6 _tmj(4.6):C961/1:delGen=1
_tlm(4.6):C714/9:delGen=5 _tm6(4.6):C2616 _tlx(4.6):C1105/273:delGen=3
_tly(4.6):C5/2:delGen=1 _tm2(4.6):C1 _tm4(4.6):C1 _tmb(4.6):C1 _tmk(4.6):C5
_tml(4.6):C12 _tmm(4.6):C1 _tmn(4.6):C2/1:delGen=1 _tmo(4.6):C1 _tmp(4.6):C1
_tmr(4.6):C1 _tms(4.6):C1)}
2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.core.SolrCore  – QuerySenderListener done.

2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.handler.component.SpellCheckComponent  – Building
spell index for spell checker: suggest

2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.spelling.suggest.Suggester  – build()/

Still no update...

/2014-08-12 13:12:58,424 | [81] | 154264021 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.core.SolrCore  – [collection1] Registered new searcher
Searcher@65c48c06 main{StandardDirectoryReader(segments_2we:82217:nrt
_qkc(4.6):C8161558/879724:delGen=275 _sra(4.6):C2943436/247953:delGen=51
_r2w(4.6):C1149753/18376:delGen=55 _rgs(4.6):C1468449/648612:delGen=107
_tdl(4.6):C583431/7873:delGen=94 _svo(4.6):C197286/7:delGen=5
_t4d(4.6):C247031/2928:delGen=36 _tkf(4.6):C111429/761:delGen=23
_tch(4.6):C6014/81:delGen=22 _tk5(4.6):C3907/242:delGen=21
_tjv(4.6):C3492/119:delGen=13 _thd(4.6):C5014/241:delGen=24
_tdh(4.6):C5375/437:delGen=30 _tj1(4.6):C5989/15:delGen=6
_tkq(4.6):C1749/36:delGen=6 _tmj(4.6):C961/1:delGen=1
_tlm(4.6):C714/9:delGen=5 _tm6(4.6):C2616 _tlx(4.6):C1105/273:delGen=3
_tly(4.6):C5/2:delGen=1 _tm2(4.6):C1 _tm4(4.6):C1 _tmb(4.6):C1 _tmk(4.6):C5
_tml(4.6):C12 _tmm(4.6):C1 _tmn(4.6):C2/1:delGen=1 _tmo(4.6):C1 _tmp(4.6):C1
_tmr(4.6):C1 _tms(4.6):C1)}/

There we go!  Finally an update!  Almost 15 minutes after making the update,
it is visible to queries.



--
View this message in context: 

Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-12 Thread Erick Erickson
I just pinged someone who really knows this stuff and the reply is that
he's copied the index from HDFS to a local file system in order to
inspect it with Luke, which means the bits on disk are identical and
may freely be copied back and forth. So I'd say go for it.

Erick


On Tue, Aug 12, 2014 at 12:28 AM, Jianyi phoenix.w.2...@qq.com wrote:

 Hi Alex,

 Thanks for your reply.

 I'm comparing Solr vs. ElasticSearch.

 Dose solr store index on hdfs in raw lucene format? I mean, if in that way,
 we can get the index files from hdfs and directly put them into an
 application based on lucene.

 It seems that ElasticSearch dose not store the raw lucene index on hdfs
 directly. It has its special data structure and operations.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152450.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread Matt Kuiper (Springblox)
Based on your solrconfig.xml settings for the filter and queryResult caches, I 
believe Chris's initial guess is correct.  After a commit, there is likely 
plenty of time spent warming these caches due to the significantly high 
autowarm counts.

filterCache class=solr.FastLRUCache
size=16384
initialSize=4096
autowarmCount=4096/
 
queryResultCache class=solr.FastLRUCache
size=8192
initialSize=8192
autowarmCount=2048/

Suggest you try setting the autowarmcount very low or to zero, and then testing 
to confirm the  problem.

You might want to monitor if any JVM garbage collections are occurring during 
this time, and causing system pauses.  With such large caches (nominally stored 
in Old Gen) you may be setting yourself up for GCs that take a significant 
amount of time and thus add to your delay.

Matt


-Original Message-
From: cwhit [mailto:cwhi...@solinkcorp.com] 
Sent: Tuesday, August 12, 2014 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Updates to index not available immediately as index scales, even 
with autoSoftCommit at 1 second

Immediately after triggering the update, this is what is in the logs:

/2014-08-12 12:58:48,774 | [71] | 153414367 [qtp2038499066-4772] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
webapp=/solr path=/update params={wt=json} {add=[52627624 
(1476251068652322816)]} 0 34

2014-08-12 12:58:49,773 | [71] | 153415369 [commitScheduler-7-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

2014-08-12 12:58:49,862 | [71] | 153415459 [commitScheduler-7-thread-1] INFO 
org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@65c48c06 main

2014-08-12 12:58:49,874 | [71] | 153415472 [commitScheduler-7-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – end_commit_flush/

The end_commit_flush leads me to believe that the soft commit has completed, 
but perhaps that thought is wrong.  There are no other logs for a while, until 


/
2014-08-12 13:03:49,556 | [71] | 153715147 [commitScheduler-6-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

2014-08-12 13:03:49,805 | [71] | 153715402 [commitScheduler-6-thread-1] INFO 
org.apache.solr.core.SolrCore  – SolrDeletionPolicy.onCommit: commits: num=2

2014-08-12 13:03:49,805 | [71] |
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program
Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2we,generation=3758}

2014-08-12 13:03:49,805 | [71] |
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program
Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2wf,generation=3759}

2014-08-12 13:03:49,805 | [34] | 153715403 [commitScheduler-6-thread-1] INFO 
org.apache.solr.core.SolrCore  – newest commit generation = 3759

2014-08-12 13:03:49,818 | [34] | 153715415 [commitScheduler-6-thread-1] INFO 
org.apache.solr.update.UpdateHandler  – end_commit_flush / At this point, the 
update is still not present...


/2014-08-12 13:11:45,279 | [81] | 154190876 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.core.SolrCore  – QuerySenderListener sending requests
to Searcher@65c48c06 main{StandardDirectoryReader(segments_2we:82217:nrt
_qkc(4.6):C8161558/879724:delGen=275 _sra(4.6):C2943436/247953:delGen=51
_r2w(4.6):C1149753/18376:delGen=55 _rgs(4.6):C1468449/648612:delGen=107
_tdl(4.6):C583431/7873:delGen=94 _svo(4.6):C197286/7:delGen=5
_t4d(4.6):C247031/2928:delGen=36 _tkf(4.6):C111429/761:delGen=23
_tch(4.6):C6014/81:delGen=22 _tk5(4.6):C3907/242:delGen=21
_tjv(4.6):C3492/119:delGen=13 _thd(4.6):C5014/241:delGen=24
_tdh(4.6):C5375/437:delGen=30 _tj1(4.6):C5989/15:delGen=6
_tkq(4.6):C1749/36:delGen=6 _tmj(4.6):C961/1:delGen=1
_tlm(4.6):C714/9:delGen=5 _tm6(4.6):C2616 _tlx(4.6):C1105/273:delGen=3
_tly(4.6):C5/2:delGen=1 _tm2(4.6):C1 _tm4(4.6):C1 _tmb(4.6):C1 _tmk(4.6):C5
_tml(4.6):C12 _tmm(4.6):C1 _tmn(4.6):C2/1:delGen=1 _tmo(4.6):C1 _tmp(4.6):C1
_tmr(4.6):C1 _tms(4.6):C1)}
2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.core.SolrCore  – QuerySenderListener done.

2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.handler.component.SpellCheckComponent  – Building
spell index for spell checker: suggest

2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1]
INFO  org.apache.solr.spelling.suggest.Suggester  – build()/

Still no 

Re: SolrCloud OOM Problem

2014-08-12 Thread tuxedomoon
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory.  Hate to
ask this but can you recommend Java memory and GC settings for 90G data and
the above memory?  Currently I have

CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m
-Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:+UseConcMarkSweepGC

Doesn't this mean I am starting with 5G and never going over 5G?

I've seen a few of those univerted multi-valued field OOMs already on the
upgraded host.

Thanks

Tux







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152585.html
Sent from the Solr - User mailing list archive at Nabble.com.


Access request

2014-08-12 Thread Vitaliy Zhovtiuk

Hello,

Please provide me access.
User id vzhovtyuk
My email vzhovt...@gmail.com
Wiki user 'Vitaliy Zhovtyuk'


ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Shawn Heisey

The field value is this:

20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 
者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury


The problem: We can't match this field with a search for 
100peopeof20century. The analysis shows that there are three terms 
indexed at the critical point by ICUTokenizerFactory: 治, 100, and 
peopeof20century. The 'script' value for the 100 term is 
Chinese/Japanese instead of Latin. Adding a space before 100 doesn't 
make any difference in the analysis.


This seems like a bug. Can anyone confirm?

This is the fieldType being used:

fieldType name=keyText class=solr.TextField sortMissingLast=true 
omitNorms=true positionIncrementGap=0

analyzer type=index
!-- remove spaces among hangul and han caracters if there
is at least one hangul character --
!-- a korean char guaranteed at the start of the pattern: 
pattern=(\p{Hangul}\p{Han}*)\s+(?=[\p{Hangul}\p{Han}]) --
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=([\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}][\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}]*)\s+(?=[\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}]) 
replacement=$1/
!-- a korean char guaranteed at the end of the pattern: 
pattern=([\p{Hangul}\p{Han}])\s+(?=[\p{Han}\s]*\p{Hangul}) --
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=([\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}])\s+(?=[\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}\s]*[\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}]) 
replacement=$1/

tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
replacement=$2
/
filter class=solr.CJKWidthFilterFactory/
filter class=edu.stanford.lucene.analysis.CJKFoldingFilterFactory/
filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/
filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.ICUNormalizer2FilterFactory/
filter class=solr.CJKBigramFilterFactory
han=true hiragana=true katakana=true
hangul=true outputUnigrams=true/
filter class=solr.LengthFilterFactory min=1 max=512/
/analyzer
analyzer type=query
!-- remove spaces among hangul and han caracters if there
is at least one hangul character --
!-- a korean char guaranteed at the start of the pattern: 
pattern=(\p{Hangul}\p{Han}*)\s+(?=[\p{Hangul}\p{Han}]) --
charFilter class=solr.PatternReplaceCharFilterFactory 

Re: SolrCloud OOM Problem

2014-08-12 Thread Shawn Heisey

On 8/12/2014 3:12 PM, tuxedomoon wrote:

I have modified my instances to m2.4xlarge 64-bit with 68.4G memory.  Hate to
ask this but can you recommend Java memory and GC settings for 90G data and
the above memory?  Currently I have

CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m
-Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:+UseConcMarkSweepGC

Doesn't this mean I am starting with 5G and never going over 5G?


Yes, that's exactly what it means -- you have a heap size limit of 5GB.  
The OutOfMemory error indicates that Solr needs more heap space than it 
is getting.  You'll need to raise the -Xmx value.  it is usually 
advisable to configure -Xms to match.


The wiki page I linked before includes a link to the following page, 
listing the GC options that I use beyond the -Xmx setting:


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Thanks,
Shawn



Re: Access request

2014-08-12 Thread Shawn Heisey

On 8/12/2014 3:29 PM, Vitaliy Zhovtiuk wrote:

Please provide me access.
User id vzhovtyuk
My email vzhovt...@gmail.com
Wiki user 'Vitaliy Zhovtyuk'


Wiki username added to the Solr wiki contributors group.You didn't 
indicate exactly what kind of access you wanted, but that's the only 
kind of access that I am able to grant to end users.


Thanks,
Shawn



Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Shawn Heisey
See the original message on this thread for full details.  Some
additional information:

This happens on version 4.6.1, 4.7.2, and 4.9.0.  Here is a screenshot
showing the analysis problem in more detail.  The first line you can see
is the ICUTokenizer.

https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png

The original field value was:

20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導
者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury

Thanks,
Shawn



Solr query involving Street Addresses

2014-08-12 Thread Guph
I'm very new to Solr, and could use a point in the right direction on a task
I've been assigned.  I have a database containing customer information
(phone number, email address, credit card, billing address, shipping
address, etc.).

I need to be able to take user-entered data, and use it to search through
the database records to make decisions/take actions based on how closely the
entered data matches the fields in the indexed documents.  

For the first 3 pieces of data I mentioned above, I will need to act on
exact matches, but for the two address fields, I would like to be able to
take actions based on how closely the entered data matches the information
in the data collection (if the entered shipping address matches 85% of the
shipping or billing addresses in any of the documents in the collection, do
X, otherwise, do Y, as an example).  The addresses contain street address,
city, state, and zip code.

Any direction or suggestions on this would be extremely appreciated - As
I've said, I'm new to Solr, and could use any help that can be provided.

Thanks,

Guph



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-involving-Street-Addresses-tp4152617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Rik Tamm-Daniels
mmn

jnbbbjb)n9nooon

Sent from my HTC

- Reply message -
From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Subject: ICUTokenizer acting very strangely with oriental characters
Date: Tue, Aug 12, 2014 19:00

See the original message on this thread for full details.  Some
additional information:

This happens on version 4.6.1, 4.7.2, and 4.9.0.  Here is a screenshot
showing the analysis problem in more detail.  The first line you can see
is the ICUTokenizer.

https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png

The original field value was:

20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導
者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury

Thanks,
Shawn



Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Steve Rowe
Shawn,

ICUTokenizer is operating as designed here.  

The key to understanding this is 
o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from 
ScriptIterator.next() with the scripts of two consecutive characters; these 
methods together find script boundaries.  Here’s ScriptIterator.isSameScript():

  /** Determine if two scripts are compatible. */
  private static boolean isSameScript(int scriptOne, int scriptTwo) {
return scriptOne = UScript.INHERITED || scriptTwo = UScript.INHERITED
|| scriptOne == scriptTwo;
  }

ASCII digits are in the Unicode script named “Common” (see 
http://www.unicode.org/Public/6.3.0/ucd/Scripts.txt), and UScript.COMMON (0) 
is less than UScript.INHERITED (1) (see 
http://www.icu-project.org/~mow/ICU4JCodeCoverage/Current/com/ibm/icu/lang/UScript.html),
 so there will be no script boundary detected between a character from an 
oriental script followed by an ASCII digit, or vice versa - the ASCII digit 
will be assigned the same script as the preceding character.

See UAX#24 for more info: http://www.unicode.org/reports/tr24/tr24-21.html 
(that’s the Unicode 6.3.0 version, which is supported by Lucene/Solr 4.9).

Steve
 

On Aug 12, 2014, at 7:00 PM, Shawn Heisey s...@elyograg.org wrote:

 See the original message on this thread for full details.  Some
 additional information:
 
 This happens on version 4.6.1, 4.7.2, and 4.9.0.  Here is a screenshot
 showing the analysis problem in more detail.  The first line you can see
 is the ICUTokenizer.
 
 https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png
 
 The original field value was:
 
 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導
 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury
 
 Thanks,
 Shawn
 



Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Shawn Heisey
On 8/12/2014 6:29 PM, Steve Rowe wrote:
 Shawn,
 
 ICUTokenizer is operating as designed here.  
 
 The key to understanding this is 
 o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from 
 ScriptIterator.next() with the scripts of two consecutive characters; these 
 methods together find script boundaries.  Here’s 
 ScriptIterator.isSameScript():
 
   /** Determine if two scripts are compatible. */
   private static boolean isSameScript(int scriptOne, int scriptTwo) {
 return scriptOne = UScript.INHERITED || scriptTwo = UScript.INHERITED
 || scriptOne == scriptTwo;
   }
 
 ASCII digits are in the Unicode script named “Common” (see 
 http://www.unicode.org/Public/6.3.0/ucd/Scripts.txt), and UScript.COMMON 
 (0) is less than UScript.INHERITED (1) (see 
 http://www.icu-project.org/~mow/ICU4JCodeCoverage/Current/com/ibm/icu/lang/UScript.html),
  so there will be no script boundary detected between a character from an 
 oriental script followed by an ASCII digit, or vice versa - the ASCII digit 
 will be assigned the same script as the preceding character.
 
 See UAX#24 for more info: http://www.unicode.org/reports/tr24/tr24-21.html 
 (that’s the Unicode 6.3.0 version, which is supported by Lucene/Solr 4.9).

So the punctuation isn't considered break-worthy?

This input:

[政 治],100foo

Becomes 政 治, 100, and foo.

Thanks,
Shawn



Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Steve Rowe
In the table below, the IsSameS (is same script) and SBreak? (script
break = not IsSameS) decisions are based on what I mentioned in my previous
message, and the WBreak (word break) decision is based on UAX#29 word
break rules:

CharCode Point   ScriptIsSameS?SBreak?  WBreak?
----   -----
---
治U+6CBB   Han  Yes  NoYes
]  U+005DCommon   Yes  NoYes
,  U+002CCommon   Yes  NoYes
1 U+0031 Common   -- --  --

First, script boundaries are found and used as token boundaries - in the
above case, no script boundary is found between 治 and 1 - and then
UAX#29 word break rules are used to find token boundaries inbetween script
boundaries - in the above case, there are word boundaries between each
character, but ICUTokenizer throws away punctuation-only sequences between
token boundaries.

Steve
www.lucidworks.com


On Tue, Aug 12, 2014 at 9:01 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/12/2014 6:29 PM, Steve Rowe wrote:
  Shawn,
 
  ICUTokenizer is operating as designed here.
 
  The key to understanding this is
 o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from
 ScriptIterator.next() with the scripts of two consecutive characters; these
 methods together find script boundaries.  Here’s
 ScriptIterator.isSameScript():
 
/** Determine if two scripts are compatible. */
private static boolean isSameScript(int scriptOne, int scriptTwo) {
  return scriptOne = UScript.INHERITED || scriptTwo =
 UScript.INHERITED
  || scriptOne == scriptTwo;
}
 
  ASCII digits are in the Unicode script named “Common” (see 
 http://www.unicode.org/Public/6.3.0/ucd/Scripts.txt), and UScript.COMMON
 (0) is less than UScript.INHERITED (1) (see 
 http://www.icu-project.org/~mow/ICU4JCodeCoverage/Current/com/ibm/icu/lang/UScript.html),
 so there will be no script boundary detected between a character from an
 oriental script followed by an ASCII digit, or vice versa - the ASCII digit
 will be assigned the same script as the preceding character.
 
  See UAX#24 for more info: 
 http://www.unicode.org/reports/tr24/tr24-21.html (that’s the Unicode
 6.3.0 version, which is supported by Lucene/Solr 4.9).

 So the punctuation isn't considered break-worthy?

 This input:

 [政 治],100foo

 Becomes 政 治, 100, and foo.

 Thanks,
 Shawn




Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-12 Thread Jianyi
Thanks Erick.

I will try. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
And how many machines running the SOLR ?




On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote:

 The machines were 32GB ram boxes. You must do the RAM requirement


And how many machines running the SOLR ?

I expect that I will have to add more servers. What I am looking for is how
do I calculate how many servers I need.