RE: yet another optimize question

2013-06-20 Thread Toke Eskildsen
Petersen, Robert [robert.peter...@mail.rakuten.com] wrote:
 We actually have hundreds of facet-able fields, but most are specialized
 and are only faceted upon if the user has drilled into the particular category
 to which they are applicable and so they are only indexed for products
 in those categories.  I guess it is the facets that eat up so much of our
 memory.

As Andre mentions, the problem is that the fc facet method maintains a list of 
values (or pointers to values, if we're talking text) for each document in the 
whole index. Faceting on a field that only has a single value in a single 
document in the whole index still allocates memory linear to the total number 
of documents. You are in the same situation as John Nielsen in the thread Solr 
using a ridiculous amount of memory 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tt4050840.html#none

You could try and change the way you index the facet information to get around 
this waste, but it is quite a lot of work:
http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/

 It was suggested that if I use facet method = enum for those particular
 specialized facets then my memory usage would go down.

If the number of unique values in the individual facets is low, this could 
work. If nothing else, it is very easy to try.

- Toke Eskildsen

Re: UnInverted multi-valued field

2013-06-20 Thread Jochen Lienhard

Hello,

well ... we have 5 multi-valued facet fields, so you had to wait 
sometimes up to one minute.


The old searcher blocks during this time.

@Toke Eskildsen: the example I posted was a very small update, usually 
there are more terms.


We are using Solr 3.6. I don't know if it will be faster with 4.x.

These are the configurations of our cache:

filterCache
  class=solr.FastLRUCache
  size=30
  initialSize=30
  autowarmCount=5/

queryResultCache
  class=solr.LRUCache
  size=10
  initialSize=10
  autowarmCount=5/

documentCache
  class=solr.LRUCache
  size=5
  initialSize=5
  autowarmCount=1/

We have 5 million document in our index.
@Roman: Do you think our autowarmCound should be larger?

Greetings

Jochen

Roman Chyla schrieb:

On Wed, Jun 19, 2013 at 5:30 AM, Jochen Lienhard 
lienh...@ub.uni-freiburg.de wrote:


Hi @all.

We have the problem that after an update the index takes to much time for
'warm up'.

We have some multivalued facet-fields and during the startup solr creates
the messages:

INFO: UnInverted multi-valued field {field=mt_facet,memSize=**
18753256,tindexSize=54,time=**170,phase1=156,nTerms=17,**
bigTerms=3,termInstances=**903276,uses=0}


In the solconfig we use the facet.method 'fc'.
We know, that the start-up with the method 'enum' is faster, but then the
searches are very slow.

How do you handle this problem?
Or have you any idea for optimizing the warm up?
Or what do you do after an update?


You probably know, but just in case... you may use autowarming; the
searcher will populate the cache and only after the warmup queries
finished, will it be exposed to the world. The old searcher continues to
handle requests in the meantime.

roman



Greetings

Jochen

--
Dr. rer. nat. Jochen Lienhard
Dezernat EDV

Albert-Ludwigs-Universität Freiburg
Universitätsbibliothek
Rempartstr. 10-16  | Postfach 1629
79098 Freiburg | 79016 Freiburg

Telefon: +49 761 203-3908
E-Mail: lienh...@ub.uni-freiburg.de
Internet: www.ub.uni-freiburg.de





--
Dr. rer. nat. Jochen Lienhard
Dezernat EDV

Albert-Ludwigs-Universität Freiburg
Universitätsbibliothek
Rempartstr. 10-16  | Postfach 1629
79098 Freiburg | 79016 Freiburg

Telefon: +49 761 203-3908
E-Mail: lienh...@ub.uni-freiburg.de
Internet: www.ub.uni-freiburg.de



DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Constantin Wolber
Hi,

i searched for a solution for quite some time but did not manage to find some 
real hints on how to fix it. 


I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 
6 container.

My data import setup is basically the following:

Data-config.xml:

entity
name=article
dataSource=ds1
query=SELECT * FROM article
deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; 
'${dih.last_index_time}
deltaImportQuery=SELECT * FROM article WHERE 
myownid=${dih.delta.myownid}
pk=myownid
field column=myownid name=id/

entity
name=supplier
dataSource=ds2
query=SELECT * FROM supplier WHERE status=1
processor=CachedSqlEntityProcessor
cacheKey=SUPPLIER_ID
cacheLookup=article.ARTICLE_SUPPLIER_ID
/entity

entity
name=attributes
dataSource=ds1
query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' 
Value:'+ATTRIBUTE_VALUE FROM attributes
cacheKey=ARTICLE_ID
cacheLookup=article.myownid
processor=CachedSqlEntityProcessor
/entity   
/entity


Ok now for the problem: 

At first I tried everything without the Cache. But the full-import took a very 
long time. Because the attributes query is pretty slow compared to the rest. As 
a result I got a processing speed of around 150 Documents/s.
When switching everything to the CachedSqlEntityProcessor the full import 
processed at the speed of 4000 Documents/s

So full import is running quite fine. Now I wanted to use the delta import. 
When running the delta import I was expecting the ramp up time to be about the 
same as in full import since I need to load the whole table supplier and 
attributes to the cache in the first step. But when looking into the log file 
the weird thing is solr seems to refresh the Cache for every single document 
that is processed. So currently my delta-import is a lot slower than the 
full-import. I even tried to add the deltaImportQuery parameter to the entity 
but it doesn't change the behavior at all (of course I know it is not supposed 
to change anything in the setup I run).

The following solutions would be possible in my opinion: 

1. Is there any way to tell the config to ignore the Cache when running a delta 
import? That would help already because we are talking about the maximum of 500 
documents changed in 15 minutes compared to over 5 million documents in total. 
2. Get solr to not refresh the cash for every document. 

Best Regards

Constantin Wolber



solr performance problem from 4.3.0 with sorting

2013-06-20 Thread Ariel Zerbib
Hi,

We updated to version 4.3.0 from 4.2.1 and we have some performance
problem with the sorting.


A query that returns 1 hits has a query time more than 100ms (can be
more than 1s) against less than 10ms for the same query without the
sort parameter:

query with sorting option: q=level_4_id:531044sort=level_4_id+asc
response:
- int name=QTime1/int
- int name=QTime106/int


query without sorting option: q=level_4_id:531024
- int name=QTime1/int
- result name=response numFound=1 start=0

the field level_4_id is unique and defined as a long.

In version 4.2.1, the performances were identical. The 4.3.1 version
has the same behavior than the version 4.3.0.

Thanks,
Ariel


Re: UnInverted multi-valued field

2013-06-20 Thread Bernd Fehling
Hello,

Am 20.06.2013 09:34, schrieb Jochen Lienhard:
 Hello,
 
 well ... we have 5 multi-valued facet fields, so you had to wait sometimes up 
 to one minute.
 
 The old searcher blocks during this time.

May be related to an already fixed SOLR-4589 issue?

Generally there is no blocking by the old searcher.
It just feels like blocking because the system is busy with your tons of 
autowarming
so that the old searcher has no chance to answer queries.

 
 @Toke Eskildsen: the example I posted was a very small update, usually there 
 are more terms.
 
 We are using Solr 3.6. I don't know if it will be faster with 4.x.

DocValues are introduced to SOLR with version 4.2

It is always good to use a more recent version because of improvements, bug 
fixes, new features,...

 
 These are the configurations of our cache:
 
 filterCache
   class=solr.FastLRUCache
   size=30
   initialSize=30
   autowarmCount=5/
 
 queryResultCache
   class=solr.LRUCache
   size=10
   initialSize=10
   autowarmCount=5/
 
 documentCache
   class=solr.LRUCache
   size=5
   initialSize=5
   autowarmCount=1/
 
 We have 5 million document in our index.

So how have you calculated these values?
Looks like they were just set by chance.

autowarmCount depends on what your system is serving and how it is configured.
Like my system with 46 million docs I have autowarmCount=0 for all and do a 
static warming.
Why static warming? Because I have a static system.

As an example you have queryResultCache set to 10 which means IF you 
calculate
with 100 qps (which is a lot) it will cache the last 1000 seconds (if all 
queries are unique)
which is 16.6 minutes. Is that what you want and what your system should serve?
And with the halfe of it (5) a new searcher should be warmed?

Regards,
Bernd

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread Bernd Fehling
Am 20.06.2013 00:18, schrieb Timothy Potter:
 I'm sure there's some site to do this but wanted to get a feel for
 who's running Solr 4 on Java 7 with G1 gc enabled?
 
 Cheers,
 Tim
 

Currently using Solr 4.2.1 in production with Oracle Java(TM) SE Runtime 
Environment (build 1.7.0_07-b10)
and using G1GC without any options. Linux 2.6.32.23-0.3-xen SMP x86_64 
GNU/Linux.
Performs better than CMS (with several tuning options). Very little  sawtooth, 
smaller faster GCs.
1 Master/3 Slave System, 128.97 GB index, 46.3 mio. docs.

Bernd


Re: yet another optimize question

2013-06-20 Thread Jack Krupansky
Take a look at using DocValues for facets that are problematic. It not only 
moves the memory off-heap, but stores values in a much more optimal manner.


-- Jack Krupansky

-Original Message- 
From: Toke Eskildsen

Sent: Thursday, June 20, 2013 3:26 AM
To: solr-user@lucene.apache.org
Subject: RE: yet another optimize question

Petersen, Robert [robert.peter...@mail.rakuten.com] wrote:

We actually have hundreds of facet-able fields, but most are specialized
and are only faceted upon if the user has drilled into the particular 
category

to which they are applicable and so they are only indexed for products
in those categories.  I guess it is the facets that eat up so much of our
memory.


As Andre mentions, the problem is that the fc facet method maintains a list 
of values (or pointers to values, if we're talking text) for each document 
in the whole index. Faceting on a field that only has a single value in a 
single document in the whole index still allocates memory linear to the 
total number of documents. You are in the same situation as John Nielsen in 
the thread Solr using a ridiculous amount of memory 
http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tt4050840.html#none


You could try and change the way you index the facet information to get 
around this waste, but it is quite a lot of work:

http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/


It was suggested that if I use facet method = enum for those particular
specialized facets then my memory usage would go down.


If the number of unique values in the individual facets is low, this could 
work. If nothing else, it is very easy to try.


- Toke Eskildsen= 



Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is possible to create two separate root entities . one for full-import
and another for delta. for the delta-import you can skip Cache that way



On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber 
constantin.wol...@medicalcolumbus.de wrote:

 Hi,

 i searched for a solution for quite some time but did not manage to find
 some real hints on how to fix it.


 I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a
 tomcat 6 container.

 My data import setup is basically the following:

 Data-config.xml:

 entity
 name=article
 dataSource=ds1
 query=SELECT * FROM article
 deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date
 gt; '${dih.last_index_time}
 deltaImportQuery=SELECT * FROM article WHERE
 myownid=${dih.delta.myownid}
 pk=myownid
 field column=myownid name=id/

 entity
 name=supplier
 dataSource=ds2
 query=SELECT * FROM supplier WHERE status=1
 processor=CachedSqlEntityProcessor
 cacheKey=SUPPLIER_ID
 cacheLookup=article.ARTICLE_SUPPLIER_ID
 /entity

 entity
 name=attributes
 dataSource=ds1
 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
 Value:'+ATTRIBUTE_VALUE FROM attributes
 cacheKey=ARTICLE_ID
 cacheLookup=article.myownid
 processor=CachedSqlEntityProcessor
 /entity
 /entity


 Ok now for the problem:

 At first I tried everything without the Cache. But the full-import took a
 very long time. Because the attributes query is pretty slow compared to the
 rest. As a result I got a processing speed of around 150 Documents/s.
 When switching everything to the CachedSqlEntityProcessor the full import
 processed at the speed of 4000 Documents/s

 So full import is running quite fine. Now I wanted to use the delta
 import. When running the delta import I was expecting the ramp up time to
 be about the same as in full import since I need to load the whole table
 supplier and attributes to the cache in the first step. But when looking
 into the log file the weird thing is solr seems to refresh the Cache for
 every single document that is processed. So currently my delta-import is a
 lot slower than the full-import. I even tried to add the deltaImportQuery
 parameter to the entity but it doesn't change the behavior at all (of
 course I know it is not supposed to change anything in the setup I run).

 The following solutions would be possible in my opinion:

 1. Is there any way to tell the config to ignore the Cache when running a
 delta import? That would help already because we are talking about the
 maximum of 500 documents changed in 15 minutes compared to over 5 million
 documents in total.
 2. Get solr to not refresh the cash for every document.

 Best Regards

 Constantin Wolber




-- 
-
Noble Paul


Re: solr performance problem from 4.3.0 with sorting

2013-06-20 Thread Shane Perry
Ariel,

I just went up against a similar issue with upgrading from 3.6.1 to 4.3.0.
 In my case, my solrconfig.xml for 4.3.0 (which was based on my 3.6.1 file)
did not provide a newSearcher or firstSearcher warming query.  After adding
a query to each listener, my query speeds drastically increased.  Check
your config file and if you aren't defining a query (make sure to sort it
on the field in question) do so.

Shane

On Thu, Jun 20, 2013 at 3:45 AM, Ariel Zerbib ariel.zer...@gmail.comwrote:

 Hi,

 We updated to version 4.3.0 from 4.2.1 and we have some performance
 problem with the sorting.


 A query that returns 1 hits has a query time more than 100ms (can be
 more than 1s) against less than 10ms for the same query without the
 sort parameter:

 query with sorting option:
 q=level_4_id:531044sort=level_4_id+asc
 response:
 - int name=QTime1/int
 - int name=QTime106/int


 query without sorting option: q=level_4_id:531024
 - int name=QTime1/int
 - result name=response numFound=1 start=0

 the field level_4_id is unique and defined as a long.

 In version 4.2.1, the performances were identical. The 4.3.1 version
 has the same behavior than the version 4.3.0.

 Thanks,
 Ariel



Steps for creating a custom query parser and search component

2013-06-20 Thread Juha Haaga
Hello list followers,

I need to write a custom Solr query parser and a search component. The 
requirements for the component are that the raw query that may need to be split 
into separate Solr queries is in a proprietary format encoded in JSON, and the 
output is also going to be in a similar proprietary JSON format. I would like 
some advice on how to get started.

Which base classes should I start to work with? I have been looking at the 
plugin classes and my initial thoughts are along the lines of following 
workflow:

1. Subclass (QParser?) and write a new parser method that knows how to deal 
with the input format.
2. Subclass (SolrQueryRequestBase?) or use LocalSolrQueryRequest like in the 
TestHarness.makeRequest() and use it to execute the required queries.
3. Compile the aggregate results as specified in the query. 
4. Use some existing component (?) for returning the results to the user.
5. Put these components in steps 1-4 together into (?) so that it can be added 
to solrconfig.xml as a custom query parser accessible at 
http://solr/core/customparser

Is my approach reasonable, or am I overlooking some canonical way of achieving 
what I need to do? What and where do I need to look into to replace the 
question marks in my plan with knowledge? :)

-- Juha



Getting the String which matched in the document as response

2013-06-20 Thread Prathik Puthran
Hi,

Is it possible to get the exact matched string in the index in the select
response of Solr.

For eg : If the search query is Hello World and the query parser is OR
solr would return all documents which matched both Hello World, only
Hello or only World.
Now I want to know which of the returned documents matched both Hello
World and which of them matched only Hello or World.

Is it possible to get this info?

Thanks,
Prathik


Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread John Nielsen

  
  
We used to use G1, but recently went
  back to CMS.
  
  G1 gave us too long stop-the-world events. CMS uses more
  ressources for the same work, but it is more predictable and we
  get better worst-case performance out of it.
  
  
  
Med venlig hilsen / Best regards

John Nielsen
Programmer

MCB A/S
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



  
  On 20-06-2013 00:18, Timothy Potter wrote:


  I'm sure there's some site to do this but wanted to get a feel for
who's running Solr 4 on Java 7 with G1 gc enabled?

Cheers,
Tim



  



Re: Getting the String which matched in the document as response

2013-06-20 Thread Jack Krupansky
Take a look at the explain section when you add the debugQuery=true 
parameter. You can additionally set debug.explain.structured=true to get the 
scoring explanation in XML if parsing the text is a problem for you.


-- Jack Krupansky

-Original Message- 
From: Prathik Puthran

Sent: Thursday, June 20, 2013 9:55 AM
To: solr-user@lucene.apache.org
Subject: Getting the String which matched in the document as response

Hi,

Is it possible to get the exact matched string in the index in the select
response of Solr.

For eg : If the search query is Hello World and the query parser is OR
solr would return all documents which matched both Hello World, only
Hello or only World.
Now I want to know which of the returned documents matched both Hello
World and which of them matched only Hello or World.

Is it possible to get this info?

Thanks,
Prathik 



Re: Steps for creating a custom query parser and search component

2013-06-20 Thread Jack Krupansky
First, my standard admonition: DON'T DO IT!!! Try harder to use the features 
Solr provides before trying to shoehorn even more code into Solr.


And... think again about whether this code needs to be inside of Solr as 
opposed to simply doing multiple requests in a clean, RESTful application 
layer that is completely under your own control.


Those disclaimers out of the way...

Start by studying any of the existing query parser plugins - AND its unit 
tests.


Ditto with search components.

Keep studying until you have specific questions.

-- Jack Krupansky

-Original Message- 
From: Juha Haaga

Sent: Thursday, June 20, 2013 9:32 AM
To: solr-user@lucene.apache.org
Subject: Steps for creating a custom query parser and search component

Hello list followers,

I need to write a custom Solr query parser and a search component. The 
requirements for the component are that the raw query that may need to be 
split into separate Solr queries is in a proprietary format encoded in JSON, 
and the output is also going to be in a similar proprietary JSON format. I 
would like some advice on how to get started.


Which base classes should I start to work with? I have been looking at the 
plugin classes and my initial thoughts are along the lines of following 
workflow:


1. Subclass (QParser?) and write a new parser method that knows how to deal 
with the input format.
2. Subclass (SolrQueryRequestBase?) or use LocalSolrQueryRequest like in the 
TestHarness.makeRequest() and use it to execute the required queries.

3. Compile the aggregate results as specified in the query.
4. Use some existing component (?) for returning the results to the user.
5. Put these components in steps 1-4 together into (?) so that it can be 
added to solrconfig.xml as a custom query parser accessible at 
http://solr/core/customparser


Is my approach reasonable, or am I overlooking some canonical way of 
achieving what I need to do? What and where do I need to look into to 
replace the question marks in my plan with knowledge? :)


-- Juha 



RE: Steps for creating a custom query parser and search component

2013-06-20 Thread Swati Swoboda
Hi Juha,

If it's just a matter of format, have you considered adding another layer 
between Solr where you've got a class that just takes in your queries in the 
proprietary format and then converts them to what Solr needs? Similarly, if you 
need your results in a format, just convert them again? I would imagine that'd 
be a lot simpler than subclassing Solr classes.

Swati

-Original Message-
From: Juha Haaga [mailto:juha.ha...@codenomicon.com] 
Sent: Thursday, June 20, 2013 9:33 AM
To: solr-user@lucene.apache.org
Subject: Steps for creating a custom query parser and search component

Hello list followers,

I need to write a custom Solr query parser and a search component. The 
requirements for the component are that the raw query that may need to be split 
into separate Solr queries is in a proprietary format encoded in JSON, and the 
output is also going to be in a similar proprietary JSON format. I would like 
some advice on how to get started.

Which base classes should I start to work with? I have been looking at the 
plugin classes and my initial thoughts are along the lines of following 
workflow:

1. Subclass (QParser?) and write a new parser method that knows how to deal 
with the input format.
2. Subclass (SolrQueryRequestBase?) or use LocalSolrQueryRequest like in the 
TestHarness.makeRequest() and use it to execute the required queries.
3. Compile the aggregate results as specified in the query. 
4. Use some existing component (?) for returning the results to the user.
5. Put these components in steps 1-4 together into (?) so that it can be added 
to solrconfig.xml as a custom query parser accessible at 
http://solr/core/customparser

Is my approach reasonable, or am I overlooking some canonical way of achieving 
what I need to do? What and where do I need to look into to replace the 
question marks in my plan with knowledge? :)

-- Juha



AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Constantin Wolber
Hi,

and thanks for the answer. But I'm a little bit confused about what you are 
suggesting. 
I did not really use the rootEntity attribute before. But from what I read in 
the documentation as far as I can tell that would result in two documents 
(maybe with the same id which would probably result in only one document being 
stored) because one for each root entity.

It would be great if you could just sketch the setup with the entities I 
provided. Because currently I have no idea on how to do it. 

Regards

Constantin


-Ursprüngliche Nachricht-
Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com] 
Gesendet: Donnerstag, 20. Juni 2013 15:42
An: solr-user@lucene.apache.org
Betreff: Re: DataImportHandler: Problems with delta-import and 
CachedSqlEntityProcessor

it is possible to create two separate root entities . one for full-import and 
another for delta. for the delta-import you can skip Cache that way



On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber  
constantin.wol...@medicalcolumbus.de wrote:

 Hi,

 i searched for a solution for quite some time but did not manage to 
 find some real hints on how to fix it.


 I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in 
 a tomcat 6 container.

 My data import setup is basically the following:

 Data-config.xml:

 entity
 name=article
 dataSource=ds1
 query=SELECT * FROM article
 deltaQuery=SELECT myownid FROM articleHistory WHERE 
 modified_date gt; '${dih.last_index_time}
 deltaImportQuery=SELECT * FROM article WHERE 
 myownid=${dih.delta.myownid}
 pk=myownid
 field column=myownid name=id/

 entity
 name=supplier
 dataSource=ds2
 query=SELECT * FROM supplier WHERE status=1
 processor=CachedSqlEntityProcessor
 cacheKey=SUPPLIER_ID
 cacheLookup=article.ARTICLE_SUPPLIER_ID
 /entity

 entity
 name=attributes
 dataSource=ds1
 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
 Value:'+ATTRIBUTE_VALUE FROM attributes
 cacheKey=ARTICLE_ID
 cacheLookup=article.myownid
 processor=CachedSqlEntityProcessor
 /entity
 /entity


 Ok now for the problem:

 At first I tried everything without the Cache. But the full-import 
 took a very long time. Because the attributes query is pretty slow 
 compared to the rest. As a result I got a processing speed of around 150 
 Documents/s.
 When switching everything to the CachedSqlEntityProcessor the full 
 import processed at the speed of 4000 Documents/s

 So full import is running quite fine. Now I wanted to use the delta 
 import. When running the delta import I was expecting the ramp up time 
 to be about the same as in full import since I need to load the whole 
 table supplier and attributes to the cache in the first step. But when 
 looking into the log file the weird thing is solr seems to refresh the 
 Cache for every single document that is processed. So currently my 
 delta-import is a lot slower than the full-import. I even tried to add 
 the deltaImportQuery parameter to the entity but it doesn't change the 
 behavior at all (of course I know it is not supposed to change anything in 
 the setup I run).

 The following solutions would be possible in my opinion:

 1. Is there any way to tell the config to ignore the Cache when 
 running a delta import? That would help already because we are talking 
 about the maximum of 500 documents changed in 15 minutes compared to 
 over 5 million documents in total.
 2. Get solr to not refresh the cash for every document.

 Best Regards

 Constantin Wolber




--
-
Noble Paul


AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Constantin Wolber
Hi,

i may have been a little to fast with my response. 

After reading a bit more I imagine you meant running the full-import with the 
entity param for the root entity for full import. And running the delta import 
with the entity param for the delta entity. Is that correct?

Regards

Constantin


-Ursprüngliche Nachricht-
Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] 
Gesendet: Donnerstag, 20. Juni 2013 16:42
An: solr-user@lucene.apache.org
Betreff: AW: DataImportHandler: Problems with delta-import and 
CachedSqlEntityProcessor

Hi,

and thanks for the answer. But I'm a little bit confused about what you are 
suggesting. 
I did not really use the rootEntity attribute before. But from what I read in 
the documentation as far as I can tell that would result in two documents 
(maybe with the same id which would probably result in only one document being 
stored) because one for each root entity.

It would be great if you could just sketch the setup with the entities I 
provided. Because currently I have no idea on how to do it. 

Regards

Constantin


-Ursprüngliche Nachricht-
Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
Gesendet: Donnerstag, 20. Juni 2013 15:42
An: solr-user@lucene.apache.org
Betreff: Re: DataImportHandler: Problems with delta-import and 
CachedSqlEntityProcessor

it is possible to create two separate root entities . one for full-import and 
another for delta. for the delta-import you can skip Cache that way



On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber  
constantin.wol...@medicalcolumbus.de wrote:

 Hi,

 i searched for a solution for quite some time but did not manage to 
 find some real hints on how to fix it.


 I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in 
 a tomcat 6 container.

 My data import setup is basically the following:

 Data-config.xml:

 entity
 name=article
 dataSource=ds1
 query=SELECT * FROM article
 deltaQuery=SELECT myownid FROM articleHistory WHERE 
 modified_date gt; '${dih.last_index_time}
 deltaImportQuery=SELECT * FROM article WHERE 
 myownid=${dih.delta.myownid}
 pk=myownid
 field column=myownid name=id/

 entity
 name=supplier
 dataSource=ds2
 query=SELECT * FROM supplier WHERE status=1
 processor=CachedSqlEntityProcessor
 cacheKey=SUPPLIER_ID
 cacheLookup=article.ARTICLE_SUPPLIER_ID
 /entity

 entity
 name=attributes
 dataSource=ds1
 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
 Value:'+ATTRIBUTE_VALUE FROM attributes
 cacheKey=ARTICLE_ID
 cacheLookup=article.myownid
 processor=CachedSqlEntityProcessor
 /entity
 /entity


 Ok now for the problem:

 At first I tried everything without the Cache. But the full-import 
 took a very long time. Because the attributes query is pretty slow 
 compared to the rest. As a result I got a processing speed of around 150 
 Documents/s.
 When switching everything to the CachedSqlEntityProcessor the full 
 import processed at the speed of 4000 Documents/s

 So full import is running quite fine. Now I wanted to use the delta 
 import. When running the delta import I was expecting the ramp up time 
 to be about the same as in full import since I need to load the whole 
 table supplier and attributes to the cache in the first step. But when 
 looking into the log file the weird thing is solr seems to refresh the 
 Cache for every single document that is processed. So currently my 
 delta-import is a lot slower than the full-import. I even tried to add 
 the deltaImportQuery parameter to the entity but it doesn't change the 
 behavior at all (of course I know it is not supposed to change anything in 
 the setup I run).

 The following solutions would be possible in my opinion:

 1. Is there any way to tell the config to ignore the Cache when 
 running a delta import? That would help already because we are talking 
 about the maximum of 500 documents changed in 15 minutes compared to 
 over 5 million documents in total.
 2. Get solr to not refresh the cash for every document.

 Best Regards

 Constantin Wolber




--
-
Noble Paul


Re: update solr.xml dynamically to add new cores

2013-06-20 Thread Michael Della Bitta
Hi,

I wouldn't edit solr.xml directly for two reasons. One being that an
already running Solr installation won't update with changes to that file,
and might actually overwrite the changes that you make to it. And two, it's
going away in a future release of Solr.

Instead, I'd make the package that installed the Solr webapp and brought it
up as you described, and have your independent index packages use either
the CoreAdmin API or Collection API to create the indexes, depending on
whether you're using Solr Cloud or not:

http://wiki.apache.org/solr/CoreAdmin
https://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API



Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Wed, Jun 19, 2013 at 8:27 PM, smanad sma...@gmail.com wrote:

 Hi,
 Is there a way to edit solr.xml as a part of debian package installation to
 add new cores.
 In my use case, there 4 solr indexes and they are managed/configured by
 different teams.
 The way I am thinking packages will work is as described below,
 1. There will be a solr-base debian package which comes with solr
 installtion with tomcat setup (I am planning to use solr 4.3)
 2. There will be individual index debian packages like,
 solr-index1, solr-index2 which will be dependent on solr-base.
 Each package's DEBIAN postinst script will have a logic to edit solr.xml to
 add new index like index1, index2, etc.

 Does this sound good? or is there a better/different way to do this?
 Any pointers will be much appreciated.
 Thanks,
 -M



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud - Score calculation

2013-06-20 Thread Learner
Thanks for your response. 

So in case of SolrCloud, SOLR/zookeeper takes care of managing the indexing
/ searching. So in that case I assume most of the shards will be of equal
size (I am just going to push the data to a leader). I assume IDF wont be a
big issue then since the shards size are almost equal... Am I right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Score-calculation-tp4071805p4071900.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud - Score calculation

2013-06-20 Thread Jack Krupansky
Even if shards are exactly the same size, the distribution of terms may not 
be equal in each shard. But, yes, if shard size and term distribution are 
equal, then IDF should be comparable across shards, sort of.


-- Jack Krupansky

-Original Message- 
From: Learner

Sent: Thursday, June 20, 2013 11:05 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud - Score calculation

Thanks for your response.

So in case of SolrCloud, SOLR/zookeeper takes care of managing the indexing
/ searching. So in that case I assume most of the shards will be of equal
size (I am just going to push the data to a leader). I assume IDF wont be a
big issue then since the shards size are almost equal... Am I right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Score-calculation-tp4071805p4071900.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Get all values from a field

2013-06-20 Thread It-forum

Hello,

I'm looking to retreive all distinct values of a specific field.

My documents have field like :
id
name
cat
ref
model
brand

I wich to be able to retreive all cat distinct values.

How could I do that with Solr, I'm totally block

Please help

Regards

David


Re: Get all values from a field

2013-06-20 Thread Erik Hatcher
David -

This is effectively faceting.  If you want to see all cat values across all 
documents, do /select?q=*:*rows=0facet=onfacet.field=cat and you'll get what 
you're looking for.

Erik

On Jun 20, 2013, at 11:35 , It-forum wrote:

 Hello,
 
 I'm looking to retreive all distinct values of a specific field.
 
 My documents have field like :
 id
 name
 cat
 ref
 model
 brand
 
 I wich to be able to retreive all cat distinct values.
 
 How could I do that with Solr, I'm totally block
 
 Please help
 
 Regards
 
 David



Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread Shawn Heisey

On 6/20/2013 8:02 AM, John Nielsen wrote:

We used to use G1, but recently went back to CMS.

G1 gave us too long stop-the-world events. CMS uses more ressources for
the same work, but it is more predictable and we get better worst-case
performance out of it.


This is exactly the behavior I saw.  When you take a look at the overall 
stats and the memory graph over time, G1 looks way better. 
Unfortunately GC with any collector does sometimes get bad, and when 
that happens, un-tuned G1 is a little worse than un-tuned CMS.  Perhaps 
if G1 were tuned, it would be really good, but I haven't been able to 
find any information on how to tune G1.


jHiccup or gclogviewer can give you really good insight into how your GC 
is doing in both average and worst-case scenarios.  jHiccup is a wrapper 
for your program and gclogviewer draws graphs from GC logs.  I'm not 
sure whether gclogviewer works with G1 logs or not, but I know that 
jHiccup will work with G1.


http://www.azulsystems.com/downloads/jHiccup
http://code.google.com/p/gclogviewer/downloads/list
http://code.google.com/p/gclogviewer/source/checkout
http://code.google.com/p/gclogviewer/issues/detail?id=7

Thanks,
Shawn



Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
yes. that's right


On Thu, Jun 20, 2013 at 8:16 PM, Constantin Wolber 
constantin.wol...@medicalcolumbus.de wrote:

 Hi,

 i may have been a little to fast with my response.

 After reading a bit more I imagine you meant running the full-import with
 the entity param for the root entity for full import. And running the delta
 import with the entity param for the delta entity. Is that correct?

 Regards

 Constantin


 -Ursprüngliche Nachricht-
 Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de]
 Gesendet: Donnerstag, 20. Juni 2013 16:42
 An: solr-user@lucene.apache.org
 Betreff: AW: DataImportHandler: Problems with delta-import and
 CachedSqlEntityProcessor

 Hi,

 and thanks for the answer. But I'm a little bit confused about what you
 are suggesting.
 I did not really use the rootEntity attribute before. But from what I read
 in the documentation as far as I can tell that would result in two
 documents (maybe with the same id which would probably result in only one
 document being stored) because one for each root entity.

 It would be great if you could just sketch the setup with the entities I
 provided. Because currently I have no idea on how to do it.

 Regards

 Constantin


 -Ursprüngliche Nachricht-
 Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
 Gesendet: Donnerstag, 20. Juni 2013 15:42
 An: solr-user@lucene.apache.org
 Betreff: Re: DataImportHandler: Problems with delta-import and
 CachedSqlEntityProcessor

 it is possible to create two separate root entities . one for full-import
 and another for delta. for the delta-import you can skip Cache that way



 On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber 
 constantin.wol...@medicalcolumbus.de wrote:

  Hi,
 
  i searched for a solution for quite some time but did not manage to
  find some real hints on how to fix it.
 
 
  I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in
  a tomcat 6 container.
 
  My data import setup is basically the following:
 
  Data-config.xml:
 
  entity
  name=article
  dataSource=ds1
  query=SELECT * FROM article
  deltaQuery=SELECT myownid FROM articleHistory WHERE
  modified_date gt; '${dih.last_index_time}
  deltaImportQuery=SELECT * FROM article WHERE
  myownid=${dih.delta.myownid}
  pk=myownid
  field column=myownid name=id/
 
  entity
  name=supplier
  dataSource=ds2
  query=SELECT * FROM supplier WHERE status=1
  processor=CachedSqlEntityProcessor
  cacheKey=SUPPLIER_ID
  cacheLookup=article.ARTICLE_SUPPLIER_ID
  /entity
 
  entity
  name=attributes
  dataSource=ds1
  query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
  Value:'+ATTRIBUTE_VALUE FROM attributes
  cacheKey=ARTICLE_ID
  cacheLookup=article.myownid
  processor=CachedSqlEntityProcessor
  /entity
  /entity
 
 
  Ok now for the problem:
 
  At first I tried everything without the Cache. But the full-import
  took a very long time. Because the attributes query is pretty slow
  compared to the rest. As a result I got a processing speed of around 150
 Documents/s.
  When switching everything to the CachedSqlEntityProcessor the full
  import processed at the speed of 4000 Documents/s
 
  So full import is running quite fine. Now I wanted to use the delta
  import. When running the delta import I was expecting the ramp up time
  to be about the same as in full import since I need to load the whole
  table supplier and attributes to the cache in the first step. But when
  looking into the log file the weird thing is solr seems to refresh the
  Cache for every single document that is processed. So currently my
  delta-import is a lot slower than the full-import. I even tried to add
  the deltaImportQuery parameter to the entity but it doesn't change the
  behavior at all (of course I know it is not supposed to change anything
 in the setup I run).
 
  The following solutions would be possible in my opinion:
 
  1. Is there any way to tell the config to ignore the Cache when
  running a delta import? That would help already because we are talking
  about the maximum of 500 documents changed in 15 minutes compared to
  over 5 million documents in total.
  2. Get solr to not refresh the cash for every document.
 
  Best Regards
 
  Constantin Wolber
 
 


 --
 -
 Noble Paul




-- 
-
Noble Paul


Need help on Solr

2013-06-20 Thread Abhishek Bansal
Hello,

I am trying to index a pdf file on Solr. I am running icurrently Solr on
Apache Tomcat 6.

When I try to index it I get below error. Please help. I was not able to
rectify this error with help of internet.




ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable
to create core: collection1
org.apache.solr.common.SolrException: [schema.xml] Duplicate field
definition for 'id'
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]] and
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]]
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
 at
org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
 at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Unable to create core:
collection1
at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
field definition for 'id'
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]] and
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]]
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
 at
org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
 at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 ... 10 more

INFO  - 2013-06-20 20:43:41.553;
org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program
Files\Apache Software Foundation\Tomcat 6.0
INFO  - 2013-06-20 20:43:41.553;
org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done
ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not
available due to init failure: [schema.xml] Duplicate field definition for
'id'
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]] and
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]]
 at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at

Re: Need help on Solr

2013-06-20 Thread Shreejay
org.apache.solr.common.SolrException: [schema.xml] Duplicate field
definition for 'id'

You might have defined an id field in the schema file. The out of box schema 
file already contains an id field .  

-- 
Shreejay


On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote:

 Hello,
 
 I am trying to index a pdf file on Solr. I am running icurrently Solr on
 Apache Tomcat 6.
 
 When I try to index it I get below error. Please help. I was not able to
 rectify this error with help of internet.
 
 
 
 
 ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable
 to create core: collection1
 org.apache.solr.common.SolrException: [schema.xml] Duplicate field
 definition for 'id'
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
 required=true}]]] and
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
 required=true}]]]
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
 at
 org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
 at
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException;
 null:org.apache.solr.common.SolrException: Unable to create core:
 collection1
 at
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
 field definition for 'id'
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
 required=true}]]] and
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
 required=true}]]]
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
 at
 org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
 at
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 ... 10 more
 
 INFO - 2013-06-20 20:43:41.553;
 org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program
 Files\Apache Software Foundation\Tomcat 6.0
 INFO - 2013-06-20 20:43:41.553;
 org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done
 ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException;
 null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not
 available due to init failure: [schema.xml] Duplicate field definition for
 'id'
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
 required=true}]]] and
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
 required=true}]]]
 at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at
 

Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread Timothy Potter
Awesome info, thanks Shawn! I'll post back my results with G1 after
we've had some time to analyze it in production.

On Thu, Jun 20, 2013 at 11:01 AM, Shawn Heisey s...@elyograg.org wrote:
 On 6/20/2013 8:02 AM, John Nielsen wrote:

 We used to use G1, but recently went back to CMS.

 G1 gave us too long stop-the-world events. CMS uses more ressources for
 the same work, but it is more predictable and we get better worst-case
 performance out of it.


 This is exactly the behavior I saw.  When you take a look at the overall
 stats and the memory graph over time, G1 looks way better. Unfortunately GC
 with any collector does sometimes get bad, and when that happens, un-tuned
 G1 is a little worse than un-tuned CMS.  Perhaps if G1 were tuned, it would
 be really good, but I haven't been able to find any information on how to
 tune G1.

 jHiccup or gclogviewer can give you really good insight into how your GC is
 doing in both average and worst-case scenarios.  jHiccup is a wrapper for
 your program and gclogviewer draws graphs from GC logs.  I'm not sure
 whether gclogviewer works with G1 logs or not, but I know that jHiccup will
 work with G1.

 http://www.azulsystems.com/downloads/jHiccup
 http://code.google.com/p/gclogviewer/downloads/list
 http://code.google.com/p/gclogviewer/source/checkout
 http://code.google.com/p/gclogviewer/issues/detail?id=7

 Thanks,
 Shawn



solr rpm

2013-06-20 Thread adamc
I am wondering why there is no official Solr RPM. 
I wish Solr releases rpm like Sphinx does,
http://sphinxsearch.com/downloads/release/






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-rpm-tp4071905.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help on Solr

2013-06-20 Thread Abhishek Bansal
Yeah I know, out of the box there is one id field. I removed it from
schema.xml

I have also added below code to automatically generate an ID.

field name=id type=uuid indexed=true stored=true default=NEW
multiValued=false/
   field name=name type=text_general indexed=true stored=true/
   field name=text type=text_general indexed=true stored=true/

with regards,
Abhishek Bansal


On 20 June 2013 21:49, Shreejay shreej...@gmail.com wrote:

 org.apache.solr.common.SolrException: [schema.xml] Duplicate field
 definition for 'id'

 You might have defined an id field in the schema file. The out of box
 schema file already contains an id field .

 --
 Shreejay


 On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote:

  Hello,
 
  I am trying to index a pdf file on Solr. I am running icurrently Solr on
  Apache Tomcat 6.
 
  When I try to index it I get below error. Please help. I was not able to
  rectify this error with help of internet.
 
 
 
 
  ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer;
 Unable
  to create core: collection1
  org.apache.solr.common.SolrException: [schema.xml] Duplicate field
  definition for 'id'
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]] and
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]]
  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
  at
 
 org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
  at
 
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)
  ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException;
  null:org.apache.solr.common.SolrException: Unable to create core:
  collection1
  at
 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)
  Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
  field definition for 'id'
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]] and
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]]
  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
  at
 
 org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
  at
 
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
  ... 10 more
 
  INFO - 2013-06-20 20:43:41.553;
  org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program
  Files\Apache Software Foundation\Tomcat 6.0
  INFO - 2013-06-20 20:43:41.553;
  org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
 done
  ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException;
  null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not
  available due to init failure: [schema.xml] Duplicate field definition
 for
  'id'
 
 

Re: Need help on Solr

2013-06-20 Thread Abhishek Bansal
As I am running Solr on windows + tomcat I am using below command to index
pdf. I hope this command is not faulty. Please check

java -jar -Durl=
http://localhost:8080/solr-4.3.0/update/extract?literal.id=1commit=true;
post.jar sample.pdf

with regards,
Abhishek Bansal


On 20 June 2013 21:56, Abhishek Bansal abhishek.bansa...@gmail.com wrote:

 Yeah I know, out of the box there is one id field. I removed it from
 schema.xml

 I have also added below code to automatically generate an ID.

 field name=id type=uuid indexed=true stored=true default=NEW
 multiValued=false/
field name=name type=text_general indexed=true stored=true/
field name=text type=text_general indexed=true stored=true/

 with regards,
 Abhishek Bansal


 On 20 June 2013 21:49, Shreejay shreej...@gmail.com wrote:

 org.apache.solr.common.SolrException: [schema.xml] Duplicate field
 definition for 'id'

 You might have defined an id field in the schema file. The out of box
 schema file already contains an id field .

 --
 Shreejay


 On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote:

  Hello,
 
  I am trying to index a pdf file on Solr. I am running icurrently Solr on
  Apache Tomcat 6.
 
  When I try to index it I get below error. Please help. I was not able to
  rectify this error with help of internet.
 
 
 
 
  ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer;
 Unable
  to create core: collection1
  org.apache.solr.common.SolrException: [schema.xml] Duplicate field
  definition for 'id'
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]] and
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]]
  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
  at
 
 org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
  at
 
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)
  ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException;
  null:org.apache.solr.common.SolrException: Unable to create core:
  collection1
  at
 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
  at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  at java.lang.Thread.run(Thread.java:662)
  Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
  field definition for 'id'
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]] and
 
 [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
  required=true}]]]
  at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176)
  at
 
 org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
  at
 
 org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
  ... 10 more
 
  INFO - 2013-06-20 20:43:41.553;
  org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program
  Files\Apache Software 

Re: solr rpm

2013-06-20 Thread Shawn Heisey

On 6/20/2013 9:30 AM, adamc wrote:

I am wondering why there is no official Solr RPM.
I wish Solr releases rpm like Sphinx does,
http://sphinxsearch.com/downloads/release/


I agree with you that Solr should be much easier to get running than it 
is.  There are some roadblocks, though.


Solr isn't an executable program.  It's a webapp.  It requires a java 
servlet container to run.  The Solr *example* has a slimmed down install 
of Jetty in it that you can run, but it's just that - an example.  It is 
*not* a trivial thing to create an installable package of Solr.


Packages are available for Debian and Ubuntu because those distributions 
have maintainers that have done the required work to split our package 
into smaller pieces.  They've done a very different job than you would 
see if we were to make an installer, because they are integrating Lucene 
as a separate dependency for both Solr and for other search packages.


Thanks,
Shawn



RE: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor

2013-06-20 Thread Dyer, James
Instead of specifying CachedSqlEntityProcessor, you can specify 
SqlEntityProcessor with cacheImpl='SortedMapBackedCache'.  If you 
parametertize this, to have SortedMapBackedCache for full updates but blank 
for deltas I think it will cache only on the full import.

Another option is to parameterize the child queries with a where clause, so 
if it is creating a new cache with every row, the cache will only contain the 
data needed for that child row.

A third option is to do your delta imports like described here:  
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
My experience is that this generally performs better than using the delta 
import feature anyhow.  The trick is on handling deletes, which will require 
its own entity and the $deleteDocById command.  See 
http://wiki.apache.org/solr/DataImportHandler#Special_Commands

But these are all workarounds.  This sounds like a bug or some subtle 
configuration problem.  I looked through the JIRA issues and did not see 
anything like this reported yet, but if you're pretty sure you are doing 
everything correctly you may want to open a bug ticket.  Be sure to flag it as 
contrib - Dataimporthandler.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] 
Sent: Thursday, June 20, 2013 3:21 AM
To: solr-user@lucene.apache.org
Subject: DataImportHandler: Problems with delta-import and 
CachedSqlEntityProcessor

Hi,

i searched for a solution for quite some time but did not manage to find some 
real hints on how to fix it. 


I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 
6 container.

My data import setup is basically the following:

Data-config.xml:

entity
name=article
dataSource=ds1
query=SELECT * FROM article
deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; 
'${dih.last_index_time}
deltaImportQuery=SELECT * FROM article WHERE 
myownid=${dih.delta.myownid}
pk=myownid
field column=myownid name=id/

entity
name=supplier
dataSource=ds2
query=SELECT * FROM supplier WHERE status=1
processor=CachedSqlEntityProcessor
cacheKey=SUPPLIER_ID
cacheLookup=article.ARTICLE_SUPPLIER_ID
/entity

entity
name=attributes
dataSource=ds1
query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' 
Value:'+ATTRIBUTE_VALUE FROM attributes
cacheKey=ARTICLE_ID
cacheLookup=article.myownid
processor=CachedSqlEntityProcessor
/entity   
/entity


Ok now for the problem: 

At first I tried everything without the Cache. But the full-import took a very 
long time. Because the attributes query is pretty slow compared to the rest. As 
a result I got a processing speed of around 150 Documents/s.
When switching everything to the CachedSqlEntityProcessor the full import 
processed at the speed of 4000 Documents/s

So full import is running quite fine. Now I wanted to use the delta import. 
When running the delta import I was expecting the ramp up time to be about the 
same as in full import since I need to load the whole table supplier and 
attributes to the cache in the first step. But when looking into the log file 
the weird thing is solr seems to refresh the Cache for every single document 
that is processed. So currently my delta-import is a lot slower than the 
full-import. I even tried to add the deltaImportQuery parameter to the entity 
but it doesn't change the behavior at all (of course I know it is not supposed 
to change anything in the setup I run).

The following solutions would be possible in my opinion: 

1. Is there any way to tell the config to ignore the Cache when running a delta 
import? That would help already because we are talking about the maximum of 500 
documents changed in 15 minutes compared to over 5 million documents in total. 
2. Get solr to not refresh the cash for every document. 

Best Regards

Constantin Wolber





Re: solr rpm

2013-06-20 Thread adamc
Thanks Shawn for explaining so fully.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-rpm-tp4071905p4071941.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr rpm

2013-06-20 Thread Boogie Shafer
there is an rpm build framework for building a jetty powered solr rpm here if 
you are interested
https://github.com/boogieshafer/jetty-solr-rpm

its currently set for solr 4.3.0 + built in jetty example + jetty start script 
and configs + jmx + logging via logback framework

edit the build script and spec file to suit your needs


From: Shawn Heisey
Sent: Thursday, June 20, 2013 09:48
To: solr-user@lucene.apache.org
Subject: Re: solr rpm

On 6/20/2013 9:30 AM, adamc wrote:
 I am wondering why there is no official Solr RPM.
 I wish Solr releases rpm like Sphinx does,
 http://sphinxsearch.com/downloads/release/

I agree with you that Solr should be much easier to get running than it
is.  There are some roadblocks, though.

Solr isn't an executable program.  It's a webapp.  It requires a java
servlet container to run.  The Solr *example* has a slimmed down install
of Jetty in it that you can run, but it's just that - an example.  It is
*not* a trivial thing to create an installable package of Solr.

Packages are available for Debian and Ubuntu because those distributions
have maintainers that have done the required work to split our package
into smaller pieces.  They've done a very different job than you would
see if we were to make an installer, because they are integrating Lucene
as a separate dependency for both Solr and for other search packages.

Thanks,
Shawn




Re: update solr.xml dynamically to add new cores

2013-06-20 Thread smanad
Thanks Michael, both the reasons make sense.

Currently I am not planning on using SolrCloud so as you suggested if I can
use http://wiki.apache.org/solr/CoreAdmin api. 
While doing that did you mean running a curl command similar to this, 
http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=data
as a part of 'postinst' script? or running it manually on the host after the
index package is installed? ( I would love to do it as a part of pkg
installation.)

Also, there will be two cases here, if I am installing a new index package
in that case create will work however, if I am updating a package with
some tweaks to configs and schema then I need to check status to see if
core is available and if yes, use reload else create. Does this make
sense?


Michael Della Bitta-2 wrote
 Hi,
 
 I wouldn't edit solr.xml directly for two reasons. One being that an
 already running Solr installation won't update with changes to that file,
 and might actually overwrite the changes that you make to it. And two,
 it's
 going away in a future release of Solr.
 
 Instead, I'd make the package that installed the Solr webapp and brought
 it
 up as you described, and have your independent index packages use either
 the CoreAdmin API or Collection API to create the indexes, depending on
 whether you're using Solr Cloud or not:
 
 http://wiki.apache.org/solr/CoreAdmin
 https://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
 
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions lt;https://twitter.com/Appinionsgt; | g+:
 plus.google.com/appinions
 w: appinions.com lt;http://www.appinions.com/gt;
 
 
 On Wed, Jun 19, 2013 at 8:27 PM, smanad lt;

 smanad@

 gt; wrote:
 
 Hi,
 Is there a way to edit solr.xml as a part of debian package installation
 to
 add new cores.
 In my use case, there 4 solr indexes and they are managed/configured by
 different teams.
 The way I am thinking packages will work is as described below,
 1. There will be a solr-base debian package which comes with solr
 installtion with tomcat setup (I am planning to use solr 4.3)
 2. There will be individual index debian packages like,
 solr-index1, solr-index2 which will be dependent on solr-base.
 Each package's DEBIAN postinst script will have a logic to edit solr.xml
 to
 add new index like index1, index2, etc.

 Does this sound good? or is there a better/different way to do this?
 Any pointers will be much appreciated.
 Thanks,
 -M



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800p4071970.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Jonathan Rochkind

(to solr-user, CC'ing author I'm responding to)

I found the solr-user listserv contribution at:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E

Which explain a way you can supply custom rulefiles to ICUTokenizer, in 
this case to tell it to only break on whitespace for Latin character 
substrings.


I am trying to use the technique explained there in Solr 4.3, but either 
it's not working, or it's not doing what I'd expect.


I want, for instance, C++ Language to be tokenized into C++, 
Language.  But the ICUTokenizer, even with the 
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi, with the rbbi file 
from the Solr 4.3 source [1].


But the ICUTokenizer, even with the that rulefile, is still stripping 
the punctuation, and tokenizing that into C, Language.


Can anyone give me any guidance or hints? I don't entirely understand 
the semantics of the rbbi file to try debugging there. Is something not 
working, or does the rbbi file just not express the semantics I want?


Thanks for any tips.



[1] 
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_3_0/lucene/analysis/icu/src/test/org/apache/lucene/analysis/icu/segmentation/Latin-break-only-on-whitespace.rbbi?revision=1479557view=markup




Re: Partial update using solr 4.3 with csv input

2013-06-20 Thread smanad
Thanks for confirming. 

So if my input is a csv file, I will need a script to read the delta
changes one by one, convert it to json and then use 'update' handler with
that piece of json data. 
Makes sense?


Jack Krupansky-2 wrote
 Correct, no atomic update for CSV format. There just isn't any place to
 put 
 the atomic update options in such a simple text format.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: smanad
 Sent: Wednesday, June 19, 2013 8:30 PM
 To: 

 solr-user@.apache

 Subject: Partial update using solr 4.3 with csv input
 
 I was going through this link
 http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of
 the comments is about support for csv.
 
 Since the comment is almost a year old, just wondering if this is still
 true
 that, partial updates are possible only with xml and json input?
 
 Thanks,
 -M
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html
 Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help on Solr

2013-06-20 Thread Raymond Wiker
On Jun 20, 2013, at 18:26 , Abhishek Bansal abhishek.bansa...@gmail.com wrote:
 Yeah I know, out of the box there is one id field. I removed it from
 schema.xml
 
 I have also added below code to automatically generate an ID.
 
 field name=id type=uuid indexed=true stored=true default=NEW
 multiValued=false/
   field name=name type=text_general indexed=true stored=true/
   field name=text type=text_general indexed=true stored=true/

Is that a valid configuration for an id field (assuming that the field id is 
also defined as uniqueKey)?∆

Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Shawn Heisey

On 6/20/2013 1:26 PM, Jonathan Rochkind wrote:
I want, for instance, C++ Language to be tokenized into C++, 
Language.  But the ICUTokenizer, even with the 
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi, with the rbbi 
file from the Solr 4.3 source [1].


But the ICUTokenizer, even with the that rulefile, is still stripping 
the punctuation, and tokenizing that into C, Language.


This screenshot is using branch_4x downloaded and compiled a couple of 
hours ago, with the rbbi file you mentioned copied to the conf directory:


https://dl.dropboxusercontent.com/u/97770508/icutokenizer-whitespace-only.png

It shows that the ++ is maintained by the ICU tokenizer. It also 
illustrates a UI bug that I will have to show to steffkes where the ++ 
is lost from the input field after analysis.


Thanks,
Shawn



RE: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread Petersen, Robert
I've been trying it out on solr 3.6.1 with a 32GB heap and G1GC seems to be 
more prone to OOMEs than CMS.  I have been running it on one slave box in our 
farm and the rest of the slaves are still on CMS and three times now it has 
gone OOM on me whereas the rest of our slaves kept chugging along with no 
errors.  I even went from no other tuning params to using these suggested on 
Shawns wiki page here and that didn't help either, still got some OOMs.  I'm 
giving it a 'fail' pretty soon here.   

-XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError
-XX:+OptimizeStringConcat -XX:+UseFastAccessorMethods
-XX:+UseG1GC -XX:+UseStringCache -XX:-UseSplitVerifier
-XX:MaxGCPauseMillis=50

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Thanks
Robi

-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: Thursday, June 20, 2013 9:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Informal poll on running Solr 4 on Java 7 with G1GC

Awesome info, thanks Shawn! I'll post back my results with G1 after we've had 
some time to analyze it in production.

On Thu, Jun 20, 2013 at 11:01 AM, Shawn Heisey s...@elyograg.org wrote:
 On 6/20/2013 8:02 AM, John Nielsen wrote:

 We used to use G1, but recently went back to CMS.

 G1 gave us too long stop-the-world events. CMS uses more ressources 
 for the same work, but it is more predictable and we get better 
 worst-case performance out of it.


 This is exactly the behavior I saw.  When you take a look at the 
 overall stats and the memory graph over time, G1 looks way better. 
 Unfortunately GC with any collector does sometimes get bad, and when 
 that happens, un-tuned
 G1 is a little worse than un-tuned CMS.  Perhaps if G1 were tuned, it 
 would be really good, but I haven't been able to find any information 
 on how to tune G1.

 jHiccup or gclogviewer can give you really good insight into how your 
 GC is doing in both average and worst-case scenarios.  jHiccup is a 
 wrapper for your program and gclogviewer draws graphs from GC logs.  
 I'm not sure whether gclogviewer works with G1 logs or not, but I know 
 that jHiccup will work with G1.

 http://www.azulsystems.com/downloads/jHiccup
 http://code.google.com/p/gclogviewer/downloads/list
 http://code.google.com/p/gclogviewer/source/checkout
 http://code.google.com/p/gclogviewer/issues/detail?id=7

 Thanks,
 Shawn





Re: solr rpm

2013-06-20 Thread Alexandre Rafalovitch
On Thu, Jun 20, 2013 at 12:48 PM, Shawn Heisey s...@elyograg.org wrote:
 They've done a very different job than you would see if we were to make an
 installer, because they are integrating Lucene as a separate dependency for
 both Solr and for other search packages.

Is that the only thing that's different? Does this one thing make a
lot of impact?

Or is there a bunch of others?

I wonder if one could write a 'reasonable deployment guide'. Would
that then make it easier for package creators to do their job (Solr in
Puppet, Solr in Chef, Solr in RPM, Solr on Windows, etc)?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: solr rpm

2013-06-20 Thread Shawn Heisey

On 6/20/2013 1:44 PM, Alexandre Rafalovitch wrote:

On Thu, Jun 20, 2013 at 12:48 PM, Shawn Heisey s...@elyograg.org wrote:

They've done a very different job than you would see if we were to make an
installer, because they are integrating Lucene as a separate dependency for
both Solr and for other search packages.


Is that the only thing that's different? Does this one thing make a
lot of impact?


They've split it into solr-common, solr-jetty, solr-tomcat, and at least 
one other package that's just lucene, no Solr.



I wonder if one could write a 'reasonable deployment guide'. Would
that then make it easier for package creators to do their job (Solr in
Puppet, Solr in Chef, Solr in RPM, Solr on Windows, etc)?


I would really like to do this, and I've filed some issues in JIRA about 
it.  For me, it's just an issue of available time to work on it.  I'm 
willing to do the work, and IMHO I'm reasonably capable.  If you want to 
put some time into it, I'm willing to look it over and make sure it gets 
included and/or mentioned in what users see when they download.


Thanks,
Shawn



Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Jonathan Rochkind
Thank you... I started out writing an email with screenshots proving 
that it wasn't working for me in 4.3.0... and of course, having to 
confirm every single detail in order to say I confirmed it... I realized 
it was a mistake on my part, not testing what I thought I was testing.


Does indeed appear to be working now. Thanks! And thanks for this feature.


On 6/20/2013 3:40 PM, Shawn Heisey wrote:

On 6/20/2013 1:26 PM, Jonathan Rochkind wrote:

I want, for instance, C++ Language to be tokenized into C++,
Language.  But the ICUTokenizer, even with the
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi, with the rbbi
file from the Solr 4.3 source [1].

But the ICUTokenizer, even with the that rulefile, is still stripping
the punctuation, and tokenizing that into C, Language.


This screenshot is using branch_4x downloaded and compiled a couple of
hours ago, with the rbbi file you mentioned copied to the conf directory:

https://dl.dropboxusercontent.com/u/97770508/icutokenizer-whitespace-only.png


It shows that the ++ is maintained by the ICU tokenizer. It also
illustrates a UI bug that I will have to show to steffkes where the ++
is lost from the input field after analysis.

Thanks,
Shawn



Re: Partial update using solr 4.3 with csv input

2013-06-20 Thread Jack Krupansky

I'd have to see the whole scenario...

What's an example of the original input, and then some examples of the kind 
of updates.


Generally, CSV is most useful simply to bulk import (or export) data. It 
wasn't really designed for incremental update of existing documents.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Thursday, June 20, 2013 3:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial update using solr 4.3 with csv input

Thanks for confirming.

So if my input is a csv file, I will need a script to read the delta
changes one by one, convert it to json and then use 'update' handler with
that piece of json data.
Makes sense?


Jack Krupansky-2 wrote

Correct, no atomic update for CSV format. There just isn't any place to
put
the atomic update options in such a simple text format.

-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Wednesday, June 19, 2013 8:30 PM
To:



solr-user@.apache



Subject: Partial update using solr 4.3 with csv input

I was going through this link
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of
the comments is about support for csv.

Since the comment is almost a year old, just wondering if this is still
true
that, partial updates are possible only with xml and json input?

Thanks,
-M



--
View this message in context:
http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Multivalued facet with 0 unexpected results

2013-06-20 Thread Samuel García Martínez
Hi all, we are getting some facet (faceting a multivalued field) values
with 0 results using *:* query. I think this is really strange, since we
are using MatchAllQuery there is no way we can get 0 results in any value.

That 0 results values were present in the index before the reindex we made.
We fixed it, so far, sending a commit and an optimize.

We still need to obtain facets with 0 results with current values, so
mincount is not an option.

Solr version: 3.6.0
Grouping: false
post grouping faceting: false
filter queries: 0

Is this any known bug or an intended behaviour? Only happens with
uninvertedfield?

Many thanks :)
-- 
Un saludo,
Samuel García.


Re: Multivalued facet with 0 unexpected results

2013-06-20 Thread Samuel García Martínez
just to clarify, we send manually the commit and optimize we use to fix
this problem. The index process send its own commit, making searchable
the new facet values. But it seems that this process is not deleting any
previous value filled used by the uninvertedfield.


On Thu, Jun 20, 2013 at 11:42 PM, Samuel García Martínez 
samuelgmarti...@gmail.com wrote:

 Hi all, we are getting some facet (faceting a multivalued field) values
 with 0 results using *:* query. I think this is really strange, since we
 are using MatchAllQuery there is no way we can get 0 results in any value.

 That 0 results values were present in the index before the reindex we
 made. We fixed it, so far, sending a commit and an optimize.

 We still need to obtain facets with 0 results with current values, so
 mincount is not an option.

 Solr version: 3.6.0
 Grouping: false
 post grouping faceting: false
 filter queries: 0

 Is this any known bug or an intended behaviour? Only happens with
 uninvertedfield?

 Many thanks :)
 --
 Un saludo,
 Samuel García.




-- 
Un saludo,
Samuel García.


Re: Partial update using solr 4.3 with csv input

2013-06-20 Thread Shalin Shekhar Mangar
Note that even though partial updates sounds like what you should do
(because only part of your data has changed), unless you are dealing
with lots of data, just re-adding everything (if possible) can be
plenty fast. So before you write complex code to construct partial
updates from your csv files, benchmark to see if it's really a
problem. For example, we used to fully import a DB (~800K records)
always because it'd take around 5 minutes - there was no need to write
a delta system.

On Fri, Jun 21, 2013 at 12:58 AM, smanad sma...@gmail.com wrote:
 Thanks for confirming.

 So if my input is a csv file, I will need a script to read the delta
 changes one by one, convert it to json and then use 'update' handler with
 that piece of json data.
 Makes sense?


 Jack Krupansky-2 wrote
 Correct, no atomic update for CSV format. There just isn't any place to
 put
 the atomic update options in such a simple text format.

 -- Jack Krupansky

 -Original Message-
 From: smanad
 Sent: Wednesday, June 19, 2013 8:30 PM
 To:

 solr-user@.apache

 Subject: Partial update using solr 4.3 with csv input

 I was going through this link
 http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of
 the comments is about support for csv.

 Since the comment is almost a year old, just wondering if this is still
 true
 that, partial updates are possible only with xml and json input?

 Thanks,
 -M



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Multivalued facet with 0 unexpected results

2013-06-20 Thread Shalin Shekhar Mangar
It sounds suspiciously similar to
https://issues.apache.org/jira/browse/SOLR-3793 which was fixed in
Solr 4.0

You should upgrade to a more recent Solr version (4.3.1 is the latest)
and see if it's still a problem for you.

On Fri, Jun 21, 2013 at 3:19 AM, Samuel García Martínez
samuelgmarti...@gmail.com wrote:
 just to clarify, we send manually the commit and optimize we use to fix
 this problem. The index process send its own commit, making searchable
 the new facet values. But it seems that this process is not deleting any
 previous value filled used by the uninvertedfield.


 On Thu, Jun 20, 2013 at 11:42 PM, Samuel García Martínez 
 samuelgmarti...@gmail.com wrote:

 Hi all, we are getting some facet (faceting a multivalued field) values
 with 0 results using *:* query. I think this is really strange, since we
 are using MatchAllQuery there is no way we can get 0 results in any value.

 That 0 results values were present in the index before the reindex we
 made. We fixed it, so far, sending a commit and an optimize.

 We still need to obtain facets with 0 results with current values, so
 mincount is not an option.

 Solr version: 3.6.0
 Grouping: false
 post grouping faceting: false
 filter queries: 0

 Is this any known bug or an intended behaviour? Only happens with
 uninvertedfield?

 Many thanks :)
 --
 Un saludo,
 Samuel García.




 --
 Un saludo,
 Samuel García.



-- 
Regards,
Shalin Shekhar Mangar.


SolrCloud replication issues

2013-06-20 Thread Sven Stark
Hello,

first: I am pretty much a Solr newcomer, so don't necessarily assume basic
solr knowledge.

My problem is that in my setup SolrCloud seems to create way too much
network traffic for replication. I hope I'm just missing some proper config
options. Here's the setup first:

* I am running a five node SolrCloud cluster on top of an external 5 node
zookeeper cluster, according to logs and clusterstate.json all nodes find
each other and are happy
* Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I
thought upgrade might solve the issue because of
https://issues.apache.org/jira/browse/SOLR-4471)
* there is only one shard
* solr.xml and solrconfig.xml are out of the box, except for the enabled
soft commit

 autoSoftCommit
   maxTime1000/maxTime
 /autoSoftCommit

* our index is minimal at the moment (dev and testing stage) 20-30Mb, about
30k small docs

The issue is when I run smallish load tests against our app which posts ca
1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic
of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each.

The non-leaders logs are full of entries like

INFO  - 2013-06-21 01:08:58.624;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover
INFO  - 2013-06-21 01:08:58.640;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover
INFO  - 2013-06-21 01:08:58.643;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover
INFO  - 2013-06-21 01:08:58.651;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover
INFO  - 2013-06-21 01:08:58.892;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover
INFO  - 2013-06-21 01:08:58.893;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover

So my assumption is I am making config errors and the cloud leader tries to
push the index to all non-leaders over and over again. But I couldn't
really find much doco on how to properly configure SolrCloud replication
online.

Any hints and help much appreciated. I can provide more info or data, just
let me know what you need.

Thanks in advance,
Sven


Re: SolrCloud replication issues

2013-06-20 Thread Shalin Shekhar Mangar
This doesn't seem right. A leader will ask a replica to recover only
when an update request could not be forwarded to it. Can you check
your leader logs to see why updates are not being sent through to
replicas?

On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au wrote:
 Hello,

 first: I am pretty much a Solr newcomer, so don't necessarily assume basic
 solr knowledge.

 My problem is that in my setup SolrCloud seems to create way too much
 network traffic for replication. I hope I'm just missing some proper config
 options. Here's the setup first:

 * I am running a five node SolrCloud cluster on top of an external 5 node
 zookeeper cluster, according to logs and clusterstate.json all nodes find
 each other and are happy
 * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I
 thought upgrade might solve the issue because of
 https://issues.apache.org/jira/browse/SOLR-4471)
 * there is only one shard
 * solr.xml and solrconfig.xml are out of the box, except for the enabled
 soft commit

  autoSoftCommit
maxTime1000/maxTime
  /autoSoftCommit

 * our index is minimal at the moment (dev and testing stage) 20-30Mb, about
 30k small docs

 The issue is when I run smallish load tests against our app which posts ca
 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic
 of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each.

 The non-leaders logs are full of entries like

 INFO  - 2013-06-21 01:08:58.624;
 org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
 we recover
 INFO  - 2013-06-21 01:08:58.640;
 org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
 we recover
 INFO  - 2013-06-21 01:08:58.643;
 org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
 we recover
 INFO  - 2013-06-21 01:08:58.651;
 org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
 we recover
 INFO  - 2013-06-21 01:08:58.892;
 org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
 we recover
 INFO  - 2013-06-21 01:08:58.893;
 org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
 we recover

 So my assumption is I am making config errors and the cloud leader tries to
 push the index to all non-leaders over and over again. But I couldn't
 really find much doco on how to properly configure SolrCloud replication
 online.

 Any hints and help much appreciated. I can provide more info or data, just
 let me know what you need.

 Thanks in advance,
 Sven



-- 
Regards,
Shalin Shekhar Mangar.


Re: Sharding and Replication clarification

2013-06-20 Thread Shalin Shekhar Mangar
On Wed, Jun 19, 2013 at 11:12 PM, Asif talla...@gmail.com wrote:
 Hi,

 I had questions on implementation of Sharding and Replication features of
 Solr/Cloud.

 1. I noticed that when sharding is enabled for a collection - individual
 requests are sent to each node serving as a shard.

Yes, search requests are distributed to a member of each logical
shard. If you know the shard that you want to search you can specify a
shards=shard1,shard2 parameter to limit searches to those shards only.


 2. Replication too follows above strategy of sending individual documents
 to the nodes serving as a replica.

Yes, full documents are replicated in SolrCloud for normal updates
instead of index fragments as used to be the case in non-cloud
replication. The old replication method is still used for recovery.


 I am working with a system that requires massive number of writes - I have
 noticed that due to above reason - the cloud eventually starts to fail
 (Even though I am using a ensemble).

 I do understand the reason behind individual updates - but why not batch
 them up or give a option to batch N updates in either of the above case - I
 did come across a presentation that talked about batching 10 updates for
 replication at least, but I do not think this is the case.
 - Asif

If you send a batch update request then the requests to replicas will
be batched. I think the default is that 10 documents per replica node
are batched together. But if you send one document at a time then the
replication will also happen one document at a time.


--
Regards,
Shalin Shekhar Mangar.


Re: SolrCloud replication issues

2013-06-20 Thread Sven Stark
Thanks for the super quick reply.

The logs are pretty big, but one thing comes up over and over again:

Leader side:

ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException;
shard update error StdNode:
http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
status:500, message:Internal Server Error
ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
shard update error StdNode:
http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
status:500, message:Internal Server Error
ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
shard update error StdNode:
http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
status:500, message:Internal Server Error

Non-Leader side:

757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync  – PeerSync:
core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying updates
from [Ljava.lang.String;@1be0799a ,update=[1, 1438251416655233024,
SolrInputDocument[type=topic, fullId=9ce54310-d89a-11e2-b89d-22000af02b44,
account=account1, site=mySite, topic=topic5, id=account1mySitetopic5,
totalCount=195, approvedCount=195, declinedCount=0, flaggedCount=0,
createdOn=2013-06-19T04:42:14.329Z, updatedOn=2013-06-19T04:42:14.386Z,
_version_=1438251416655233024]]
java.lang.UnsupportedOperationException
at
org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46)
at
org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201)
at
org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718)
at
org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487)
at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335)
at org.apache.solr.update.PeerSync.sync(PeerSync.java:265)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)

Unfortunately I don't see what kind of UnsupportedOperation this could be
referring to.

Many thanks,
Sven


On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 This doesn't seem right. A leader will ask a replica to recover only
 when an update request could not be forwarded to it. Can you check
 your leader logs to see why updates are not being sent through to
 replicas?

 On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au
 wrote:
  Hello,
 
  first: I am pretty much a Solr newcomer, so don't necessarily assume
 basic
  solr knowledge.
 
  My problem is that in my setup SolrCloud seems to create way too much
  network traffic for replication. I hope I'm just missing some proper
 config
  options. Here's the setup first:
 
  * I am running a five node SolrCloud cluster on top of an external 5 node
  zookeeper cluster, according to logs and clusterstate.json all nodes find
  each other and are happy
  * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I
  thought upgrade might solve the issue because of
  https://issues.apache.org/jira/browse/SOLR-4471)
  * there is only one shard
  * solr.xml and solrconfig.xml are out of the box, except for the enabled
  soft commit
 
   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit
 
  * our index is minimal at the moment (dev and testing stage) 20-30Mb,
 about
  30k small docs
 
  The issue is when I run smallish load tests against our app which posts
 ca
  1-2 docs/sec to solr, the SolrCloud leader creates outgoing network
 traffic
  of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each.
 
  The non-leaders logs are full of entries like
 
  INFO  - 2013-06-21 01:08:58.624;
  org.apache.solr.handler.admin.CoreAdminHandler; It has been requested
 that
  we recover
  INFO  - 2013-06-21 01:08:58.640;
  org.apache.solr.handler.admin.CoreAdminHandler; It has been requested
 that
  we recover
  INFO  - 2013-06-21 01:08:58.643;
  org.apache.solr.handler.admin.CoreAdminHandler; It has been requested
 that
  we recover
  INFO  - 2013-06-21 01:08:58.651;
  org.apache.solr.handler.admin.CoreAdminHandler; It has been requested
 that
  we recover
  INFO 

Re: SolrCloud replication issues

2013-06-20 Thread Sven Stark
Actually this looks very much like

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3ccacbkj07ob4kjxwe_ogzfuqg5qg99qwpovbzkdota8bihcis...@mail.gmail.com%3E

Sven


On Fri, Jun 21, 2013 at 11:54 AM, Sven Stark sven.st...@m-square.com.auwrote:

 Thanks for the super quick reply.

 The logs are pretty big, but one thing comes up over and over again:

 Leader side:

 ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException;
 shard update error StdNode: 
 http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
 status:500, message:Internal Server Error
 ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
 shard update error StdNode: 
 http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
 status:500, message:Internal Server Error
 ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
 shard update error StdNode: 
 http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
 status:500, message:Internal Server Error

 Non-Leader side:

 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync  – PeerSync:
 core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying
 updates from [Ljava.lang.String;@1be0799a ,update=[1,
 1438251416655233024, SolrInputDocument[type=topic,
 fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite,
 topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195,
 declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z,
 updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]]
 java.lang.UnsupportedOperationException
 at
 org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46)
 at
 org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201)
 at
 org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718)
 at
 org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
 at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487)
 at
 org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335)
 at org.apache.solr.update.PeerSync.sync(PeerSync.java:265)
 at
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366)
 at
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)

 Unfortunately I don't see what kind of UnsupportedOperation this could be
 referring to.

 Many thanks,
 Sven


 On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 This doesn't seem right. A leader will ask a replica to recover only
 when an update request could not be forwarded to it. Can you check
 your leader logs to see why updates are not being sent through to
 replicas?

 On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au
 wrote:
  Hello,
 
  first: I am pretty much a Solr newcomer, so don't necessarily assume
 basic
  solr knowledge.
 
  My problem is that in my setup SolrCloud seems to create way too much
  network traffic for replication. I hope I'm just missing some proper
 config
  options. Here's the setup first:
 
  * I am running a five node SolrCloud cluster on top of an external 5
 node
  zookeeper cluster, according to logs and clusterstate.json all nodes
 find
  each other and are happy
  * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I
  thought upgrade might solve the issue because of
  https://issues.apache.org/jira/browse/SOLR-4471)
  * there is only one shard
  * solr.xml and solrconfig.xml are out of the box, except for the enabled
  soft commit
 
   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit
 
  * our index is minimal at the moment (dev and testing stage) 20-30Mb,
 about
  30k small docs
 
  The issue is when I run smallish load tests against our app which posts
 ca
  1-2 docs/sec to solr, the SolrCloud leader creates outgoing network
 traffic
  of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each.
 
  The non-leaders logs are full of entries like
 
  INFO  - 2013-06-21 01:08:58.624;
  org.apache.solr.handler.admin.CoreAdminHandler; It has been requested
 that
  we recover
  INFO  - 2013-06-21 01:08:58.640;
  

Re: SolrCloud replication issues

2013-06-20 Thread Shalin Shekhar Mangar
Okay so from the same thread, have you made sure the _version_ field
is a long in schema?

field name=_version_ type=long indexed=true stored=true
multiValued=false/

On Fri, Jun 21, 2013 at 7:44 AM, Sven Stark sven.st...@m-square.com.au wrote:
 Actually this looks very much like

 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3ccacbkj07ob4kjxwe_ogzfuqg5qg99qwpovbzkdota8bihcis...@mail.gmail.com%3E

 Sven


 On Fri, Jun 21, 2013 at 11:54 AM, Sven Stark 
 sven.st...@m-square.com.auwrote:

 Thanks for the super quick reply.

 The logs are pretty big, but one thing comes up over and over again:

 Leader side:

 ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException;
 shard update error StdNode: 
 http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
 status:500, message:Internal Server Error
 ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
 shard update error StdNode: 
 http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
 status:500, message:Internal Server Error
 ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
 shard update error StdNode: 
 http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
 status:500, message:Internal Server Error

 Non-Leader side:

 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync  – PeerSync:
 core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying
 updates from [Ljava.lang.String;@1be0799a ,update=[1,
 1438251416655233024, SolrInputDocument[type=topic,
 fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite,
 topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195,
 declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z,
 updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]]
 java.lang.UnsupportedOperationException
 at
 org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46)
 at
 org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201)
 at
 org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718)
 at
 org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
 at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487)
 at
 org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335)
 at org.apache.solr.update.PeerSync.sync(PeerSync.java:265)
 at
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366)
 at
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)

 Unfortunately I don't see what kind of UnsupportedOperation this could be
 referring to.

 Many thanks,
 Sven


 On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 This doesn't seem right. A leader will ask a replica to recover only
 when an update request could not be forwarded to it. Can you check
 your leader logs to see why updates are not being sent through to
 replicas?

 On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au
 wrote:
  Hello,
 
  first: I am pretty much a Solr newcomer, so don't necessarily assume
 basic
  solr knowledge.
 
  My problem is that in my setup SolrCloud seems to create way too much
  network traffic for replication. I hope I'm just missing some proper
 config
  options. Here's the setup first:
 
  * I am running a five node SolrCloud cluster on top of an external 5
 node
  zookeeper cluster, according to logs and clusterstate.json all nodes
 find
  each other and are happy
  * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I
  thought upgrade might solve the issue because of
  https://issues.apache.org/jira/browse/SOLR-4471)
  * there is only one shard
  * solr.xml and solrconfig.xml are out of the box, except for the enabled
  soft commit
 
   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit
 
  * our index is minimal at the moment (dev and testing stage) 20-30Mb,
 about
  30k small docs
 
  The issue is when I run smallish load tests against our app which posts
 ca
  1-2 docs/sec to solr, the SolrCloud leader creates outgoing network
 traffic
  of 20-30Mbyte/sec and the non-leader receive 

Re: Informal poll on running Solr 4 on Java 7 with G1GC

2013-06-20 Thread William Bell
It would be good to see some CMS configs too... Can you send your java
params?


On Wed, Jun 19, 2013 at 8:46 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/19/2013 4:18 PM, Timothy Potter wrote:
  I'm sure there's some site to do this but wanted to get a feel for
  who's running Solr 4 on Java 7 with G1 gc enabled?

 I have tried it, but found that G1 didn't give me any better GC pause
 characteristics than CMS without tuning, and may have actually been
 worse.  Now I use CMS with several tuning options.

 Thanks,
 Shawn




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud replication issues

2013-06-20 Thread Sven Stark
I think you're onto it. Our schema.xml had it

field name=_version_  type=string indexed=true  stored=true
multiValued=false/

I'll change and test it. Will probably not happen before Monday though.

Many thanks already,
Sven



On Fri, Jun 21, 2013 at 2:18 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Okay so from the same thread, have you made sure the _version_ field
 is a long in schema?

 field name=_version_ type=long indexed=true stored=true
 multiValued=false/

 On Fri, Jun 21, 2013 at 7:44 AM, Sven Stark sven.st...@m-square.com.au
 wrote:
  Actually this looks very much like
 
 
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3ccacbkj07ob4kjxwe_ogzfuqg5qg99qwpovbzkdota8bihcis...@mail.gmail.com%3E
 
  Sven
 
 
  On Fri, Jun 21, 2013 at 11:54 AM, Sven Stark sven.st...@m-square.com.au
 wrote:
 
  Thanks for the super quick reply.
 
  The logs are pretty big, but one thing comes up over and over again:
 
  Leader side:
 
  ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException;
  shard update error StdNode: http://xxx:xxx:xx:xx:8983
 /solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
  status:500, message:Internal Server Error
  ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
  shard update error StdNode: http://xxx:xxx:xx:xx:8983
 /solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
  status:500, message:Internal Server Error
  ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException;
  shard update error StdNode: http://xxx:xxx:xx:xx:8983
 /solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok
  status:500, message:Internal Server Error
 
  Non-Leader side:
 
  757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync  –
 PeerSync:
  core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying
  updates from [Ljava.lang.String;@1be0799a ,update=[1,
  1438251416655233024, SolrInputDocument[type=topic,
  fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1,
 site=mySite,
  topic=topic5, id=account1mySitetopic5, totalCount=195,
 approvedCount=195,
  declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z,
  updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]]
  java.lang.UnsupportedOperationException
  at
 
 org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46)
  at
 
 org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201)
  at
  org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718)
  at
  org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184)
  at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635)
  at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
  at
 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
  at
 org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487)
  at
  org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335)
  at org.apache.solr.update.PeerSync.sync(PeerSync.java:265)
  at
 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366)
  at
  org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
 
  Unfortunately I don't see what kind of UnsupportedOperation this could
 be
  referring to.
 
  Many thanks,
  Sven
 
 
  On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  This doesn't seem right. A leader will ask a replica to recover only
  when an update request could not be forwarded to it. Can you check
  your leader logs to see why updates are not being sent through to
  replicas?
 
  On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark 
 sven.st...@m-square.com.au
  wrote:
   Hello,
  
   first: I am pretty much a Solr newcomer, so don't necessarily assume
  basic
   solr knowledge.
  
   My problem is that in my setup SolrCloud seems to create way too much
   network traffic for replication. I hope I'm just missing some proper
  config
   options. Here's the setup first:
  
   * I am running a five node SolrCloud cluster on top of an external 5
  node
   zookeeper cluster, according to logs and clusterstate.json all nodes
  find
   each other and are happy
   * Solr version is now 4.3.1, but the problem also existed on 4.1.0 (
 I
   thought upgrade might solve the issue because of
   https://issues.apache.org/jira/browse/SOLR-4471)
   * there is only one shard
   * solr.xml and solrconfig.xml are out of the 

Queuing for Solr Updates?

2013-06-20 Thread William Bell
Is there a simpler way to kick off a DIH handler update when it is running?

Scenario:

1. Doing an update using DIH
2. We need to kick off another update. Cannot since DIH is already running.
So the program inserts into a table (ID=55)
3. Since the DIH is still running old update, we cannot fire an update to
DIH.

We want it to run right away. If there a simple way to queue up an update
if it is still running? We wrote the query to catch all pending, so we only
need to run it again if there is 1 or more pending updates.

Thoughts?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Varnish

2013-06-20 Thread William Bell
Who is using varnish in front of SOLR?

Anyone have any configs that work with the cache control headers of SOLR?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Queuing for Solr Updates?

2013-06-20 Thread Gora Mohanty
On 21 June 2013 11:12, William Bell billnb...@gmail.com wrote:
 Is there a simpler way to kick off a DIH handler update when it is running?

 Scenario:

 1. Doing an update using DIH
 2. We need to kick off another update. Cannot since DIH is already running.
 So the program inserts into a table (ID=55)
 3. Since the DIH is still running old update, we cannot fire an update to
 DIH.

 We want it to run right away. If there a simple way to queue up an update
 if it is still running? We wrote the query to catch all pending, so we only
 need to run it again if there is 1 or more pending updates.
[...]

Your best bet might be to write an external daemon that
monitors the DIH import status as well as the the update
queue, and launches a new DIH import as required.

Regards,
Gora