Re: DIH import from MySQL results in garbage text for special chars

2012-09-27 Thread Pranav Prakash
The output of Show variables goes like this. I have verified with the hex
values and they are different in MySQL and Solr.

| Variable_name| Value  |
+--++
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results| latin1 |
| character_set_server | latin1 |
| character_set_system | utf8   |
| character_sets_dir   | /usr/share/mysql/charsets/



*Pranav Prakash*

temet nosce



On Wed, Sep 26, 2012 at 6:45 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 21 September 2012 11:19, Pranav Prakash pra...@gmail.com wrote:

  I am seeing the garbage text in browser, Luke Index Toolbox and
 everywhere
  it is the same. My servlet container is Jetty which is the out-of-box
 one.
  Many other special chars are getting indexed and stored properly, only
 few
  characters causes pain.
 

 Could you double-check the encoding on the mysql side?
 What is the output of

 mysql SHOW VARIABLES LIKE 'character\_set\_%';

 Regards,
 Gora



Re: DIH import from MySQL results in garbage text for special chars

2012-09-27 Thread Hasan Diwan
Mr Prakash,

On 27 September 2012 02:06, Pranav Prakash pra...@gmail.com wrote:

 | Variable_name| Value  |
 +--++
 | character_set_client | latin1 |
 | character_set_connection | latin1 |
 | character_set_database   | latin1 |
 | character_set_filesystem | binary |
 | character_set_results| latin1 |
 | character_set_server | latin1 |
 | character_set_system | utf8   |


These should all be the same (presumably the system encoding).  -- H
-- 
Sent from my mobile device
Envoyait de mon portable


Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-09-27 Thread Vadim Kisselmann
Hi Ahmet,
thanks for your reply:)
I see that it does not come with the 4.0 release, because the given
patches do not work with this version.
Right?
Best regards
Vadim


2012/9/26 Ahmet Arslan iori...@yahoo.com:

 we assume i have a simple query like this with wildcard and
 tilde:

 japa* fukushima~10

 instead of japan fukushima~10 OR japanese fukushima~10,
 etc.

 Do we have a solution in Solr 4.0 to work with these kind of
 queries?

 Vadim, two open jira issues:

 https://issues.apache.org/jira/browse/SOLR-1604
 https://issues.apache.org/jira/browse/LUCENE-1486



Re: Items disappearing from Solr index

2012-09-27 Thread Kissue Kissue
#What is the field type for that field - string or text?

It is a string type.

Thanks.

On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky j...@basetechnology.comwrote:

 What is the field type for that field - string or text?


 -- Jack Krupansky

 -Original Message- From: Kissue Kissue
 Sent: Wednesday, September 26, 2012 1:43 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Items disappearing from Solr index

 # It is looking for documents with Emory in the specified field OR Labs
 in the default search field.

 This does not seem to be the case. For instance issuing a deleteByQuery for
 catalogueId: PEARL LINGUISTICS LTD also deletes the contents of a
 catalogueId with the value: Ncl_**MacNaughtonMcGregorCoaching_**
 vf010811.

 Thanks.

 On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  It is looking for documents with Emory in the specified field OR Labs
 in the default search field.

 -- Jack Krupansky

 -Original Message- From: Kissue Kissue
 Sent: Wednesday, September 26, 2012 7:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Items disappearing from Solr index


 I have just solved this problem.

 We have a field called catalogueId. One possible value for this field
 could
 be Emory Labs. I found out that when the following delete by query is
 sent to solr:

 getSolrServer().deleteByQuery(catalogueId + : + Emory Labs)
  [Notice

 that
 there are no quotes surrounding the catalogueId value - Emory Labs]

 For some reason this delete by query ends up deleting the contents of some
 other random catalogues too which is the reason why we are loosing items
 from the index. When the query is changed to:

 getSolrServer().deleteByQuery(catalogueId + : + Emory Labs),
 then it

 starts to correctly delete only items in the Emory Labs catalogue.

 So my first question is, what exactly does deleteByQuery do in the first
 query without the quotes? How is it determining which catalogues to
 delete?

 Secondly, shouldn't the correct behaviour be not to delete anything at all
 in this case since when a search is done for the same catalogueId without
 the quotes it just simply returns no results?

 Thanks.


 On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue kissue...@gmail.com
 wrote:

  Hi Erick,


 Thanks for your reply. Yes i am using delete by query. I am currently
 logging the number of items to be deleted before handing off to solr. And
 from solr logs i can it deleted exactly that number. I will verify
 further.

 Thanks.


 On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.com
 
 **wrote:


  How do you delete items? By ID or by query?


 My guess is that one of two things is happening:
 1 your delete process is deleting too much data.
 2 your index process isn't indexing what you think.

 I'd add some logging to the SolrJ program to see what
 it thinks is has deleted or added to the index and go from there.

 Best
 Erick

 On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer
 to
  index and delete items from solr.
 
  I basically index items from the db into solr every night. Existing
 items
  can be marked for deletion in the db and a delete request sent to solr
 to
  delete such items.
 
  My process runs as follows every night:
 
  1. Check if items have been marked for deletion and delete from solr.
  I
  commit and optimize after the entire solr deletion runs.
  2. Index any new items to solr. I commit and optimize after all the 
 new
  items have been added.
 
  Recently i started noticing that huge chunks of items that have not 
 been
  marked for deletion are disappearing from the index. I checked the 
 solr
  logs and the logs indicate that it is deleting exactly the number of
 items
  requested but still a lot of other items disappear from the index from
 time
  to time. Any ideas what might be causing this or what i am doing 
 wrong.
 
 
  Thanks.









Re: How can I create about 100000 independent indexes in Solr?

2012-09-27 Thread Tanguy Moal
Hello Monton,

I wanted to make sure that you understood me well : I really don't how well
does solr scale if the number of fields increases...

What I mean here is that the more distinct fields you index, the more
memory you will need.

So if in your schema, you have something like 15 fields declared, then
storing data for 100 distinct customers would generate 1500 fields in the
index.

I really don't know how well would that scale.

The simplest solution is one core per customer but the same issue (memory
consumption) will rise at some point, I guess.

There must be a clever way to do that...

--
Tanguy

2012/9/26 韦震宇 weizhe...@win-trust.com

 Hi, Tanguy
  I would do as your suggestion.
 Best Regards!
 Monton
 - Original Message -
 From: Tanguy Moal tanguy.m...@gmail.com
 To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
 Sent: Tuesday, September 25, 2012 11:05 PM
 Subject: Re: How can I create about 10 independent indexes in Solr?


 That is an interesting issue...
 I was wondering if relying on dynamic fields could be an option...

 Something like :

 * field_name: field_type
 * customer : string
 * *_field_a1 : type_a
 * *_field_a2 : type_a
 * *_field_b1 : type_b
 * ...

 And the prefix each field by the customer name, so for customer1, indexed
 documents are as follow :
 * customer : customer1
 * customer1_field_a1 : value for field_a1
 * customer1_field_a2 : value for field_a2
 * customer1_field_b1 : value for field_b1
 * ...
 And for customer2 :
 * customer : customer2
 * customer2_field_a1 : value for field_a1
 * customer2_field_a2 : value for field_a2
 * customer2_field_b1 : value for field_b1
 * ...

 This solution is simple, and helps isolating each customers fields so
 features like suggester, spellcheck, ..., things relying on frequencies
 would work (as if in a single core)

 I just don't how well does solr scale if the number of fields increases...

 Then scaling could be achieved depending on number of doc / customer and
 number of customer / core (if amount of fields consumes resources)

 Could that help ?

 --
 Tanguy

 2012/9/25 Toke Eskildsen t...@statsbiblioteket.dk

  On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote:
   The company I'm working in have a website to server more than 10
   customers, and every customer should have it's own search cataegory.
   So I should create independent index for every customer.
 
  How many of the customers are active at any given time and how large are
  the indexes? Depending on usage you might be able to have a limited
  number of indexes open at any given time and opening new indexes on
  demand.
 
 



ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Shigeki Kobayashi
Hi guys,


I use Manifold CF to crawl files in Windows file server and index them to
Solr using Extracting Request Handler.
Most of the documents are succesfully indexed but some are failed and Out
of Memory Error occurs in Solr, so I need some advice.

Those failed files are not so big and they are a csv file of 240MB and a
text file of 170MB.

Here is environment and machine spec:
Solr 3.6 (also Solr4.0Beta)
Tomcat 6.0
CentOS 5.6
java version 1.6.0_23
HDD 60GB
MEM 2GB
JVM Heap: -Xmx1024m -Xms1024m

I feel there is enough memory that Solr should be able to extract and index
file content.

Here is a Solr log below:
--
[solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuilder.append(StringBuilder.java:189)
at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

-

Anyone has any ideas?

Regards,

Shigeki


Re: How to retrive value from float field in custom request handler?

2012-09-27 Thread ravicv
Thanks guys.. I was able to retrieve all values now.. 
But why  Solr Field is not having a method to retrieve values for all data
types?
something similar to 
Object obj =  doc.getField(Field1);

Why only stringvalue is exposed in this Field class?

doc.getField(Field1).stringValue()

Thanks,
ravi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retrive-value-from-float-field-in-custom-request-handler-tp4010478p4010707.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with Special Characters in SOLR Query

2012-09-27 Thread aniljayanti
Hi ,

Im using text_general fieldType for searching in SOLR. while searching
keywords along with special characters not getting proper results and
getting errors. used special characters like below.
1) - 
2) 
3) +

QUERY :: 

*solr?q=Healing - Live*
*solr?q=Healing  Live*
*solr?q=Healing ? Live*

Error  message:

The request sent by the client was syntactically incorrect
(org.apache.lucene.queryParser.ParseException: Cannot parse '(Healing \':
Lexical error at line 1, column 8. Encountered: EOF after : \Healing
\\).


schema.xml
---

fieldType name=text_generalold class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/  
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


field name=title type=text_general indexed=true  
stored=true /

field name=text type=text_general indexed=true stored=false
multiValued=true/

defaultSearchFieldtext/defaultSearchField

copyField source=title dest=text/

Please suggest me in this, and thanks in advance.

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread irshad siddiqui
Hi,
Just escape all solr special char

example:
solr?q=Healing \ Live

Regards,
Irshad

On Thu, Sep 27, 2012 at 3:55 PM, aniljayanti anil.jaya...@gmail.com wrote:

 Hi ,

 Im using text_general fieldType for searching in SOLR. while searching
 keywords along with special characters not getting proper results and
 getting errors. used special characters like below.
 1) -
 2) 
 3) +

 QUERY ::

 *solr?q=Healing - Live*
 *solr?q=Healing  Live*
 *solr?q=Healing ? Live*

 Error  message:

 The request sent by the client was syntactically incorrect
 (org.apache.lucene.queryParser.ParseException: Cannot parse '(Healing \':
 Lexical error at line 1, column 8. Encountered: EOF after : \Healing
 \\).


 schema.xml
 ---

 fieldType name=text_generalold class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.ASCIIFoldingFilterFactory /
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.ASCIIFoldingFilterFactory /
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType


 field name=title type=text_general indexed=true
  stored=true /

 field name=text type=text_general indexed=true stored=false
 multiValued=true/

 defaultSearchFieldtext/defaultSearchField

 copyField source=title dest=text/

 Please suggest me in this, and thanks in advance.

 AnilJayanti



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Replication and Autocommit

2012-09-27 Thread Erick Erickson
I'll echo Otis, nothing comes to mind...

Unless you were indexing stuff to the _slaves_, which you should
never do, now or in the past

Erick

On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona avor...@ea.com wrote:
 Hi,

 I remember having some issues with replication and autocommit previously.
 But now we are using Solr 3.6.1. Are there any known issues or any other
 reasons to avoid autocommit while using replication? I guess not, just want
 confirmation from someone confident and competent.

 -- Aleksey


Re: How can I create about 100000 independent indexes in Solr?

2012-09-27 Thread 韦震宇
Hi, Tanguy
   Oh, I understand now. I don't have the issue as you. Though there
are so many customers in our site, but the fields they owned are same.
so few field fields are ok in my scene.
Best Regards!
Monton

- Original Message - 
From: Tanguy Moal tanguy.m...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, September 27, 2012 4:34 PM
Subject: Re: How can I create about 10 independent indexes in Solr?


Hello Monton,

I wanted to make sure that you understood me well : I really don't how well
does solr scale if the number of fields increases...

What I mean here is that the more distinct fields you index, the more
memory you will need.

So if in your schema, you have something like 15 fields declared, then
storing data for 100 distinct customers would generate 1500 fields in the
index.

I really don't know how well would that scale.

The simplest solution is one core per customer but the same issue (memory
consumption) will rise at some point, I guess.

There must be a clever way to do that...

--
Tanguy

2012/9/26 韦震宇 weizhe...@win-trust.com

 Hi, Tanguy
  I would do as your suggestion.
 Best Regards!
 Monton
 - Original Message -
 From: Tanguy Moal tanguy.m...@gmail.com
 To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk
 Sent: Tuesday, September 25, 2012 11:05 PM
 Subject: Re: How can I create about 10 independent indexes in Solr?


 That is an interesting issue...
 I was wondering if relying on dynamic fields could be an option...

 Something like :

 * field_name: field_type
 * customer : string
 * *_field_a1 : type_a
 * *_field_a2 : type_a
 * *_field_b1 : type_b
 * ...

 And the prefix each field by the customer name, so for customer1, indexed
 documents are as follow :
 * customer : customer1
 * customer1_field_a1 : value for field_a1
 * customer1_field_a2 : value for field_a2
 * customer1_field_b1 : value for field_b1
 * ...
 And for customer2 :
 * customer : customer2
 * customer2_field_a1 : value for field_a1
 * customer2_field_a2 : value for field_a2
 * customer2_field_b1 : value for field_b1
 * ...

 This solution is simple, and helps isolating each customers fields so
 features like suggester, spellcheck, ..., things relying on frequencies
 would work (as if in a single core)

 I just don't how well does solr scale if the number of fields increases...

 Then scaling could be achieved depending on number of doc / customer and
 number of customer / core (if amount of fields consumes resources)

 Could that help ?

 --
 Tanguy

 2012/9/25 Toke Eskildsen t...@statsbiblioteket.dk

  On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote:
   The company I'm working in have a website to server more than 10
   customers, and every customer should have it's own search cataegory.
   So I should create independent index for every customer.
 
  How many of the customers are active at any given time and how large are
  the indexes? Depending on usage you might be able to have a limited
  number of indexes open at any given time and opening new indexes on
  demand.
 
 



httpSolrServer and exyternal load balancer

2012-09-27 Thread Lee Carroll
Hi

We have the following solr http server

bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
id=solrserver 
constructor-arg value=urlToSlaveLoadBalancer /
property name=soTimeout value=1000 /
property name=connectionTimeout value=1000 /
property name=defaultMaxConnectionsPerHost value=5 /
property name=maxTotalConnections value=20 /
property name=allowCompression value=true /
/bean

The issue we face is the f5 balancer is returning a cookie which the client
is hanging onto. resulting in the same slave being hit for all requests.

one obvious solution is to config the load balancer to be non sticky
however politically a non-standard load balancer is timescale suicide.
(It is an out sourced corporate thing)

I'm not keen to use the LB http solr server as i don't want this to be a
concern of the software and have a list of servers etc. (although as a stop
gap may well have to)

My question is can I configure the solr server to ignore client state ? We
are on solr 3.4

Thanks in advance lee c


Re: Items disappearing from Solr index

2012-09-27 Thread Erick Erickson
Wild shot in the dark

What happens if you switch from StreamingUpdateSolrServer to HttpSolrServer?

What I'm wondering is if somehow you're getting a queueing problem. If you have
multiple threads defined for SUSS, it might be possible (and I'm guessing) that
the delete bit is getting sent after some of the adds. Frankly I doubt this is
the case, but this issue is so weird that I'm grasping at straws.

BTW, there's no reason to optimize twice. Actually, the new thinking is that
optimizing usually isn't necessary anyway. But if you insist on optimizing
there's no reason to do it _both_ after the deletes and after the adds, just
do it after the adds.

Best
Erick

On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue kissue...@gmail.com wrote:
 #What is the field type for that field - string or text?

 It is a string type.

 Thanks.

 On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

 What is the field type for that field - string or text?


 -- Jack Krupansky

 -Original Message- From: Kissue Kissue
 Sent: Wednesday, September 26, 2012 1:43 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Items disappearing from Solr index

 # It is looking for documents with Emory in the specified field OR Labs
 in the default search field.

 This does not seem to be the case. For instance issuing a deleteByQuery for
 catalogueId: PEARL LINGUISTICS LTD also deletes the contents of a
 catalogueId with the value: Ncl_**MacNaughtonMcGregorCoaching_**
 vf010811.

 Thanks.

 On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  It is looking for documents with Emory in the specified field OR Labs
 in the default search field.

 -- Jack Krupansky

 -Original Message- From: Kissue Kissue
 Sent: Wednesday, September 26, 2012 7:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Items disappearing from Solr index


 I have just solved this problem.

 We have a field called catalogueId. One possible value for this field
 could
 be Emory Labs. I found out that when the following delete by query is
 sent to solr:

 getSolrServer().deleteByQuery(catalogueId + : + Emory Labs)
  [Notice

 that
 there are no quotes surrounding the catalogueId value - Emory Labs]

 For some reason this delete by query ends up deleting the contents of some
 other random catalogues too which is the reason why we are loosing items
 from the index. When the query is changed to:

 getSolrServer().deleteByQuery(catalogueId + : + Emory Labs),
 then it

 starts to correctly delete only items in the Emory Labs catalogue.

 So my first question is, what exactly does deleteByQuery do in the first
 query without the quotes? How is it determining which catalogues to
 delete?

 Secondly, shouldn't the correct behaviour be not to delete anything at all
 in this case since when a search is done for the same catalogueId without
 the quotes it just simply returns no results?

 Thanks.


 On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue kissue...@gmail.com
 wrote:

  Hi Erick,


 Thanks for your reply. Yes i am using delete by query. I am currently
 logging the number of items to be deleted before handing off to solr. And
 from solr logs i can it deleted exactly that number. I will verify
 further.

 Thanks.


 On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.com
 
 **wrote:


  How do you delete items? By ID or by query?


 My guess is that one of two things is happening:
 1 your delete process is deleting too much data.
 2 your index process isn't indexing what you think.

 I'd add some logging to the SolrJ program to see what
 it thinks is has deleted or added to the index and go from there.

 Best
 Erick

 On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer
 to
  index and delete items from solr.
 
  I basically index items from the db into solr every night. Existing
 items
  can be marked for deletion in the db and a delete request sent to solr
 to
  delete such items.
 
  My process runs as follows every night:
 
  1. Check if items have been marked for deletion and delete from solr.
  I
  commit and optimize after the entire solr deletion runs.
  2. Index any new items to solr. I commit and optimize after all the 
 new
  items have been added.
 
  Recently i started noticing that huge chunks of items that have not 
 been
  marked for deletion are disappearing from the index. I checked the 
 solr
  logs and the logs indicate that it is deleting exactly the number of
 items
  requested but still a lot of other items disappear from the index from
 time
  to time. Any ideas what might be causing this or what i am doing 
 wrong.
 
 
  Thanks.









Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread aniljayanti
Hi,

thanks,

I tried with below query getting result.

q=Cheat \- Album Version

But getting error with below.

q=Oot \ Aboot

Error message :
--
message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
Lexical error at line 1, column 6. Encountered: EOF after : 

description The request sent by the client was syntactically incorrect
(org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical
error at line 1, column 6. Encountered: EOF after : ).

anilJayanti





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712p4010720.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Erick Erickson
What client state? Solr servers are stateless, they don't
keep any information specific to particular clients so this
doesn't seem to be a problem.

What Solr _does_ do is cache things like fq clauses, but
these are not user-specific. Which actually argues for going
to the same slave on the theory that requests from a
user are more likely to have the same fq clauses. Consider
faceting on shoes. The user clicks mens and you add an
fq like fq=gender:mens. Then the user wants dress shoes
so you submit another query fq=gender:mensfq=style:dress.
The first fq clause has already been calculated and cached so
doesn't have to be re-calculated for the second query...

But the stickiness is usually the way Solr is used, so this seems
like a red herring.

FWIW,
Erick

On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
lee.a.carr...@googlemail.com wrote:
 Hi

 We have the following solr http server

 bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
 id=solrserver 
 constructor-arg value=urlToSlaveLoadBalancer /
 property name=soTimeout value=1000 /
 property name=connectionTimeout value=1000 /
 property name=defaultMaxConnectionsPerHost value=5 /
 property name=maxTotalConnections value=20 /
 property name=allowCompression value=true /
 /bean

 The issue we face is the f5 balancer is returning a cookie which the client
 is hanging onto. resulting in the same slave being hit for all requests.

 one obvious solution is to config the load balancer to be non sticky
 however politically a non-standard load balancer is timescale suicide.
 (It is an out sourced corporate thing)

 I'm not keen to use the LB http solr server as i don't want this to be a
 concern of the software and have a list of servers etc. (although as a stop
 gap may well have to)

 My question is can I configure the solr server to ignore client state ? We
 are on solr 3.4

 Thanks in advance lee c


Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread Toke Eskildsen
On Thu, 2012-09-27 at 13:49 +0200, aniljayanti wrote:
 But getting error with below.
 
 q=Oot \ Aboot
 
 Error message :
 --
 message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
 Lexical error at line 1, column 6. Encountered: EOF after : 

It seems like you are sending the query by performing a REST-call.
You need to URL-escape those, because  in an url is a delimiter for
arguments.

Instead of
http://localhost:8983/solr/collection1/select/?q=Oot \ Aboot
you need to send
http://localhost:8983/solr/collection1/select/?q=Oot%20%5C%26%20Aboot



Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread Erick Erickson
Right, you're conflating two separate issues
1 URL escaping. the  is a special character in the URL, entirely
separate from Solr. Try using %26 rather than \
2 Query parsing. Once the string gets through the URL and servlet
   container, it's in query parsing land, where the escaping of
   _query_ special characters like '-' counts.
3 And just to confuse matters a LOT, when you're looking at
 URLs, space is translated to '+'. So when you look in your log
 file, you'll see the query q=me myself reported as
 q=me+myself which has nothing to do with the Lucene MUST
(+) operator

Best
Erick

On Thu, Sep 27, 2012 at 7:49 AM, aniljayanti anil.jaya...@gmail.com wrote:
 Hi,

 thanks,

 I tried with below query getting result.

 q=Cheat \- Album Version

 But getting error with below.

 q=Oot \ Aboot

 Error message :
 --
 message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
 Lexical error at line 1, column 6. Encountered: EOF after : 

 description The request sent by the client was syntactically incorrect
 (org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical
 error at line 1, column 6. Encountered: EOF after : ).

 anilJayanti





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712p4010720.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Lee Carroll
Hi Erick,

the load balancer in front of the solr servers is dropping the cookie not
the solr server themselves.

are you saying the clients http connection manager builds will ignore this
state ? it looks like they do not. It looks like the
client is passing the cookie back to the load balancer

I want to configure the clients not to pass cookies basically.

Does that make sense ?



On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com wrote:

 What client state? Solr servers are stateless, they don't
 keep any information specific to particular clients so this
 doesn't seem to be a problem.

 What Solr _does_ do is cache things like fq clauses, but
 these are not user-specific. Which actually argues for going
 to the same slave on the theory that requests from a
 user are more likely to have the same fq clauses. Consider
 faceting on shoes. The user clicks mens and you add an
 fq like fq=gender:mens. Then the user wants dress shoes
 so you submit another query fq=gender:mensfq=style:dress.
 The first fq clause has already been calculated and cached so
 doesn't have to be re-calculated for the second query...

 But the stickiness is usually the way Solr is used, so this seems
 like a red herring.

 FWIW,
 Erick

 On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
 lee.a.carr...@googlemail.com wrote:
  Hi
 
  We have the following solr http server
 
  bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
  id=solrserver 
  constructor-arg value=urlToSlaveLoadBalancer /
  property name=soTimeout value=1000 /
  property name=connectionTimeout value=1000 /
  property name=defaultMaxConnectionsPerHost value=5 /
  property name=maxTotalConnections value=20 /
  property name=allowCompression value=true /
  /bean
 
  The issue we face is the f5 balancer is returning a cookie which the
 client
  is hanging onto. resulting in the same slave being hit for all requests.
 
  one obvious solution is to config the load balancer to be non sticky
  however politically a non-standard load balancer is timescale suicide.
  (It is an out sourced corporate thing)
 
  I'm not keen to use the LB http solr server as i don't want this to be a
  concern of the software and have a list of servers etc. (although as a
 stop
  gap may well have to)
 
  My question is can I configure the solr server to ignore client state ?
 We
  are on solr 3.4
 
  Thanks in advance lee c



Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Erick Erickson
But again, why do you want to do this? I really think you don't.

I'm assuming that when you say this:
...resulting in the same slave being hit for all requests.

you mean all requests _from the same client_. If that's
not what's happening, then disregard my maundering
because when it comes to setting up LBs, I'm clueless. But
I can say that many installations have LBs set up with
sticky sessions on a per-client basis..

Consider another scenario; replication. If you have 2 slaves,
each with a polling interval of 5 minutes note that they are
not coordinated. So slave 1 can poll at 14:00:00. Slave 2
at 14:01:00. Say there's been a commit at 14:00:30. Requests
to slave 2 will have a different view of the index than slave 1,
so if your user resends the exact same request, they may
see different results. I could submit the request 5 times in a
row and the results would not only be different each time, they
would flip-flop back and forth.

I wouldn't do this unless and until you have a demonstrated need.

Best
Erick

On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll
lee.a.carr...@googlemail.com wrote:
 Hi Erick,

 the load balancer in front of the solr servers is dropping the cookie not
 the solr server themselves.

 are you saying the clients http connection manager builds will ignore this
 state ? it looks like they do not. It looks like the
 client is passing the cookie back to the load balancer

 I want to configure the clients not to pass cookies basically.

 Does that make sense ?



 On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com wrote:

 What client state? Solr servers are stateless, they don't
 keep any information specific to particular clients so this
 doesn't seem to be a problem.

 What Solr _does_ do is cache things like fq clauses, but
 these are not user-specific. Which actually argues for going
 to the same slave on the theory that requests from a
 user are more likely to have the same fq clauses. Consider
 faceting on shoes. The user clicks mens and you add an
 fq like fq=gender:mens. Then the user wants dress shoes
 so you submit another query fq=gender:mensfq=style:dress.
 The first fq clause has already been calculated and cached so
 doesn't have to be re-calculated for the second query...

 But the stickiness is usually the way Solr is used, so this seems
 like a red herring.

 FWIW,
 Erick

 On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
 lee.a.carr...@googlemail.com wrote:
  Hi
 
  We have the following solr http server
 
  bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
  id=solrserver 
  constructor-arg value=urlToSlaveLoadBalancer /
  property name=soTimeout value=1000 /
  property name=connectionTimeout value=1000 /
  property name=defaultMaxConnectionsPerHost value=5 /
  property name=maxTotalConnections value=20 /
  property name=allowCompression value=true /
  /bean
 
  The issue we face is the f5 balancer is returning a cookie which the
 client
  is hanging onto. resulting in the same slave being hit for all requests.
 
  one obvious solution is to config the load balancer to be non sticky
  however politically a non-standard load balancer is timescale suicide.
  (It is an out sourced corporate thing)
 
  I'm not keen to use the LB http solr server as i don't want this to be a
  concern of the software and have a list of servers etc. (although as a
 stop
  gap may well have to)
 
  My question is can I configure the solr server to ignore client state ?
 We
  are on solr 3.4
 
  Thanks in advance lee c



Re: Items disappearing from Solr index

2012-09-27 Thread Kissue Kissue
Actually this problem occurs even when i am doing just deletes. I tested by
sending only one delete query for a single catalogue and had the same
problem. I always optimize once.

I changed to the syntax you suggested ( {!term f=catalogueId}Emory Labs)
and works like a charm. Thanks for the pointer, saved me from another issue
that could have occurred at some point.

Thanks.



On Thu, Sep 27, 2012 at 12:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 Wild shot in the dark

 What happens if you switch from StreamingUpdateSolrServer to
 HttpSolrServer?

 What I'm wondering is if somehow you're getting a queueing problem. If you
 have
 multiple threads defined for SUSS, it might be possible (and I'm guessing)
 that
 the delete bit is getting sent after some of the adds. Frankly I doubt
 this is
 the case, but this issue is so weird that I'm grasping at straws.

 BTW, there's no reason to optimize twice. Actually, the new thinking is
 that
 optimizing usually isn't necessary anyway. But if you insist on optimizing
 there's no reason to do it _both_ after the deletes and after the adds,
 just
 do it after the adds.

 Best
 Erick

 On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue kissue...@gmail.com
 wrote:
  #What is the field type for that field - string or text?
 
  It is a string type.
 
  Thanks.
 
  On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
  What is the field type for that field - string or text?
 
 
  -- Jack Krupansky
 
  -Original Message- From: Kissue Kissue
  Sent: Wednesday, September 26, 2012 1:43 PM
 
  To: solr-user@lucene.apache.org
  Subject: Re: Items disappearing from Solr index
 
  # It is looking for documents with Emory in the specified field OR
 Labs
  in the default search field.
 
  This does not seem to be the case. For instance issuing a deleteByQuery
 for
  catalogueId: PEARL LINGUISTICS LTD also deletes the contents of a
  catalogueId with the value: Ncl_**MacNaughtonMcGregorCoaching_**
  vf010811.
 
  Thanks.
 
  On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky 
 j...@basetechnology.com*
  *wrote:
 
   It is looking for documents with Emory in the specified field OR
 Labs
  in the default search field.
 
  -- Jack Krupansky
 
  -Original Message- From: Kissue Kissue
  Sent: Wednesday, September 26, 2012 7:47 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Items disappearing from Solr index
 
 
  I have just solved this problem.
 
  We have a field called catalogueId. One possible value for this field
  could
  be Emory Labs. I found out that when the following delete by query is
  sent to solr:
 
  getSolrServer().deleteByQuery(catalogueId + : + Emory Labs)
   [Notice
 
  that
  there are no quotes surrounding the catalogueId value - Emory Labs]
 
  For some reason this delete by query ends up deleting the contents of
 some
  other random catalogues too which is the reason why we are loosing
 items
  from the index. When the query is changed to:
 
  getSolrServer().deleteByQuery(catalogueId + : + Emory Labs),
  then it
 
  starts to correctly delete only items in the Emory Labs catalogue.
 
  So my first question is, what exactly does deleteByQuery do in the
 first
  query without the quotes? How is it determining which catalogues to
  delete?
 
  Secondly, shouldn't the correct behaviour be not to delete anything at
 all
  in this case since when a search is done for the same catalogueId
 without
  the quotes it just simply returns no results?
 
  Thanks.
 
 
  On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue kissue...@gmail.com
  wrote:
 
   Hi Erick,
 
 
  Thanks for your reply. Yes i am using delete by query. I am currently
  logging the number of items to be deleted before handing off to solr.
 And
  from solr logs i can it deleted exactly that number. I will verify
  further.
 
  Thanks.
 
 
  On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson 
 erickerick...@gmail.com
  
  **wrote:
 
 
   How do you delete items? By ID or by query?
 
 
  My guess is that one of two things is happening:
  1 your delete process is deleting too much data.
  2 your index process isn't indexing what you think.
 
  I'd add some logging to the SolrJ program to see what
  it thinks is has deleted or added to the index and go from there.
 
  Best
  Erick
 
  On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com
  wrote:
   Hi,
  
   I am running Solr 3.5, using SolrJ and using
 StreamingUpdateSolrServer
  to
   index and delete items from solr.
  
   I basically index items from the db into solr every night. Existing
  items
   can be marked for deletion in the db and a delete request sent to
 solr
  to
   delete such items.
  
   My process runs as follows every night:
  
   1. Check if items have been marked for deletion and delete from
 solr.
   I
   commit and optimize after the entire solr deletion runs.
   2. Index any new items to solr. I commit and optimize after all
 the 
  new
   items have been added.
  
  

Query filtering

2012-09-27 Thread Finotti Simone
Hello,
I'm doing this query to return top 10 facets within a given context, 
specified via the fq parameter.

http://solr/core/select?fq=(...)q=*:*rows=0facet.field=interesting_facetfacet.limit=10

Now, I should search for a term inside the context AND the previously 
identified top 10 facet values.

Is there a way to do this with a single query?

thank you in advance,
S


Regarding delta-import and full-import

2012-09-27 Thread darshan
Hi All,

Can anyone refer me few number blogs that explains both
imports in little bit more detail and with examples.

 

Thanks,

Darshan



Re: Regarding delta-import and full-import

2012-09-27 Thread Koji Sekiguchi

(12/09/27 22:45), darshan wrote:

Hi All,

 Can anyone refer me few number blogs that explains both
imports in little bit more detail and with examples.



Thanks,

Darshan




Asking Google, I got:

http://www.arunchinnachamy.com/apache-solr-mysql-data-import/
http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx
http://pooteeweet.org/blog/1827

:

koji
--
http://soleami.com/blog/starting-lab-work.html


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Lee Carroll
Hi Erick

Our application has one  CommonsHttpSolrServer for each solr core used by
our web app. Whilst we have many web app clients
solr only has 1 client, our application. Does that make sense. This is why
sticky load balancing is an issue for us.

I cannot see any where the state is being handled in the
CommonsHttpSolrServer  impl ? It looks like the state is not being passed
by the client or am i missing something?

Cheers Lee c

On 27 September 2012 14:00, Erick Erickson erickerick...@gmail.com wrote:

 But again, why do you want to do this? I really think you don't.

 I'm assuming that when you say this:
 ...resulting in the same slave being hit for all requests.

 you mean all requests _from the same client_. If that's
 not what's happening, then disregard my maundering
 because when it comes to setting up LBs, I'm clueless. But
 I can say that many installations have LBs set up with
 sticky sessions on a per-client basis..

 Consider another scenario; replication. If you have 2 slaves,
 each with a polling interval of 5 minutes note that they are
 not coordinated. So slave 1 can poll at 14:00:00. Slave 2
 at 14:01:00. Say there's been a commit at 14:00:30. Requests
 to slave 2 will have a different view of the index than slave 1,
 so if your user resends the exact same request, they may
 see different results. I could submit the request 5 times in a
 row and the results would not only be different each time, they
 would flip-flop back and forth.

 I wouldn't do this unless and until you have a demonstrated need.

 Best
 Erick

 On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll
 lee.a.carr...@googlemail.com wrote:
  Hi Erick,
 
  the load balancer in front of the solr servers is dropping the cookie not
  the solr server themselves.
 
  are you saying the clients http connection manager builds will ignore
 this
  state ? it looks like they do not. It looks like the
  client is passing the cookie back to the load balancer
 
  I want to configure the clients not to pass cookies basically.
 
  Does that make sense ?
 
 
 
  On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com
 wrote:
 
  What client state? Solr servers are stateless, they don't
  keep any information specific to particular clients so this
  doesn't seem to be a problem.
 
  What Solr _does_ do is cache things like fq clauses, but
  these are not user-specific. Which actually argues for going
  to the same slave on the theory that requests from a
  user are more likely to have the same fq clauses. Consider
  faceting on shoes. The user clicks mens and you add an
  fq like fq=gender:mens. Then the user wants dress shoes
  so you submit another query fq=gender:mensfq=style:dress.
  The first fq clause has already been calculated and cached so
  doesn't have to be re-calculated for the second query...
 
  But the stickiness is usually the way Solr is used, so this seems
  like a red herring.
 
  FWIW,
  Erick
 
  On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
  lee.a.carr...@googlemail.com wrote:
   Hi
  
   We have the following solr http server
  
   bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
   id=solrserver 
   constructor-arg value=urlToSlaveLoadBalancer /
   property name=soTimeout value=1000 /
   property name=connectionTimeout value=1000 /
   property name=defaultMaxConnectionsPerHost value=5 /
   property name=maxTotalConnections value=20 /
   property name=allowCompression value=true /
   /bean
  
   The issue we face is the f5 balancer is returning a cookie which the
  client
   is hanging onto. resulting in the same slave being hit for all
 requests.
  
   one obvious solution is to config the load balancer to be non sticky
   however politically a non-standard load balancer is timescale
 suicide.
   (It is an out sourced corporate thing)
  
   I'm not keen to use the LB http solr server as i don't want this to
 be a
   concern of the software and have a list of servers etc. (although as a
  stop
   gap may well have to)
  
   My question is can I configure the solr server to ignore client state
 ?
  We
   are on solr 3.4
  
   Thanks in advance lee c
 



Re: Query filtering

2012-09-27 Thread Amit Nithian
I think one way to do this is issue another query and set a bunch of
filter queries to restrict interesting_facet to just those ten
values returned in the first query.

fq=interesting_facet:1 OR interesting_facet:2 etcq=context:whatever

Does that help?
Amit

On Thu, Sep 27, 2012 at 6:33 AM, Finotti Simone tech...@yoox.com wrote:
 Hello,
 I'm doing this query to return top 10 facets within a given context, 
 specified via the fq parameter.

 http://solr/core/select?fq=(...)q=*:*rows=0facet.field=interesting_facetfacet.limit=10

 Now, I should search for a term inside the context AND the previously 
 identified top 10 facet values.

 Is there a way to do this with a single query?

 thank you in advance,
 S


Re: Solr Replication and Autocommit

2012-09-27 Thread Aleksey Vorona

Thank both of you for the responses!

-- Aleksey

On 12-09-27 03:51 AM, Erick Erickson wrote:

I'll echo Otis, nothing comes to mind...

Unless you were indexing stuff to the _slaves_, which you should
never do, now or in the past

Erick

On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona avor...@ea.com wrote:

Hi,

I remember having some issues with replication and autocommit previously.
But now we are using Solr 3.6.1. Are there any known issues or any other
reasons to avoid autocommit while using replication? I guess not, just want
confirmation from someone confident and competent.

-- Aleksey




RE: SolrJ - IOException

2012-09-27 Thread balaji.gandhi
Thanks for your reply. SOLR Server is not stalled. Just the add fails with this 
exception.

Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering  │  Apollo Group, Inc.
1225 W. Washington St.  |  AZ23  |  Tempe, AZ  85281
Phone: 602.713.2417  |  Email: 
balaji.gan...@apollogrp.edumailto:balaji.gan...@apollogrp.edu

P Go Green. Don't Print. Moreover soft copies can be indexed by algorithms.

From: roz dev [via Lucene] [mailto:ml-node+s472066n4010037...@n3.nabble.com]
Sent: Monday, September 24, 2012 5:46 PM
To: Balaji Gandhi
Subject: Re: SolrJ - IOException

I have seen this happening

We retry and that works. Is your solr server stalled?

On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi
[hidden email]/user/SendEmail.jtp?type=nodenode=4010037i=0wrote:

 Hi,

 I am encountering this error randomly (under load) when posting to Solr
 using SolrJ.

 Has anyone encountered a similar error?

 org.apache.solr.client.solrj.SolrServerException: IOException occured when
 talking to server at: http://localhost:8080/solr/profile at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at

 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at

 Thanks,
 Balaji



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html
 Sent from the Solr - User mailing list archive at Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010037.html
To unsubscribe from SolrJ - IOException, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4010026code=YmFsYWppLmdhbmRoaUBhcG9sbG9ncnAuZWR1fDQwMTAwMjZ8LTEwNzE2NTA1NDI=.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010795.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrJ - IOException

2012-09-27 Thread balaji.gandhi
Here is the stack trace:-

org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server:
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at 
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at 
org.apache.solr.handler.dataimport.thread.task.SolrUploadTask.upload(SolrUploadTask.java:31)
 at 
org.apache.solr.handler.dataimport.thread.SolrUploader.run(SolrUploader.java:31)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at 
java.lang.Thread.run(Unknown Source) Caused by: 
org.apache.http.NoHttpResponseException: The target server failed to respond at 
org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
 at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
 at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
 at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
 at 
org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
 at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
 at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
 ... 9 more

Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering  │  Apollo Group, Inc.
1225 W. Washington St.  |  AZ23  |  Tempe, AZ  85281
Phone: 602.713.2417  |  Email: 
balaji.gan...@apollogrp.edumailto:balaji.gan...@apollogrp.edu

P Go Green. Don't Print. Moreover soft copies can be indexed by algorithms.

From: Toke Eskildsen [via Lucene] 
[mailto:ml-node+s472066n4010082...@n3.nabble.com]
Sent: Tuesday, September 25, 2012 12:19 AM
To: Balaji Gandhi
Subject: Re: SolrJ - IOException

On Tue, 2012-09-25 at 01:50 +0200, balaji.gandhi wrote:
 I am encountering this error randomly (under load) when posting to Solr
 using SolrJ.

 Has anyone encountered a similar error?

 org.apache.solr.client.solrj.SolrServerException: IOException occured when
 talking to server at: http://localhost:8080/solr/profile at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
[...]

This looks suspiciously like a potential bug in the HTTP keep-alive flow
that we encountered some weeks ago. I am guessing that you are issuing
more than 100 separate updates/second. Could you please provide the full
stack trace?



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010082.html
To unsubscribe from SolrJ - IOException, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4010026code=YmFsYWppLmdhbmRoaUBhcG9sbG9ncnAuZWR1fDQwMTAwMjZ8LTEwNzE2NTA1NDI=.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010796.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to run Solr Cloud using Tomcat?

2012-09-27 Thread Benjamin, Roy
I've gone through the guide on running Solr Cloud using Jetty but it's not
practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
to extend these instructions to running on Tomcat.

Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?

Thanks

Roy


RE: How to run Solr Cloud using Tomcat?

2012-09-27 Thread Markus Jelsma
Hi - on Debian systems there's a /etc/default/tomcat properties file you can 
use to set your flags.
 
-Original message-
 From:Benjamin, Roy rbenja...@ebay.com
 Sent: Thu 27-Sep-2012 19:57
 To: solr-user@lucene.apache.org
 Subject: How to run Solr Cloud using Tomcat?
 
 I've gone through the guide on running Solr Cloud using Jetty but it's not
 practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
 to extend these instructions to running on Tomcat.
 
 Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?
 
 Thanks
 
 Roy
 


Re: How to run Solr Cloud using Tomcat?

2012-09-27 Thread Vadim Kisselmann
Hi Roy,
jepp, it works with Tomcat 6 and an external Zookeeper.
I will publish a blogpost about it tomorrow on sentric.ch
My blogpost is ready, but i had no time to publish it in the last
couple of days:)
Best regards
Vadim



2012/9/27 Markus Jelsma markus.jel...@openindex.io:
 Hi - on Debian systems there's a /etc/default/tomcat properties file you can 
 use to set your flags.

 -Original message-
 From:Benjamin, Roy rbenja...@ebay.com
 Sent: Thu 27-Sep-2012 19:57
 To: solr-user@lucene.apache.org
 Subject: How to run Solr Cloud using Tomcat?

 I've gone through the guide on running Solr Cloud using Jetty but it's not
 practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
 to extend these instructions to running on Tomcat.

 Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?

 Thanks

 Roy



Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Erick Erickson
Ahh, I finally think I get it. I was missing the connection being
the CommonsHttpSolrServer. That's the thing that's locking
on to a particular slave

I'm afraid I'm not up enough on the internals here to be much
help, so I'll have to defer

Erick.

On Thu, Sep 27, 2012 at 10:20 AM, Lee Carroll
lee.a.carr...@googlemail.com wrote:
 Hi Erick

 Our application has one  CommonsHttpSolrServer for each solr core used by
 our web app. Whilst we have many web app clients
 solr only has 1 client, our application. Does that make sense. This is why
 sticky load balancing is an issue for us.

 I cannot see any where the state is being handled in the
 CommonsHttpSolrServer  impl ? It looks like the state is not being passed
 by the client or am i missing something?

 Cheers Lee c

 On 27 September 2012 14:00, Erick Erickson erickerick...@gmail.com wrote:

 But again, why do you want to do this? I really think you don't.

 I'm assuming that when you say this:
 ...resulting in the same slave being hit for all requests.

 you mean all requests _from the same client_. If that's
 not what's happening, then disregard my maundering
 because when it comes to setting up LBs, I'm clueless. But
 I can say that many installations have LBs set up with
 sticky sessions on a per-client basis..

 Consider another scenario; replication. If you have 2 slaves,
 each with a polling interval of 5 minutes note that they are
 not coordinated. So slave 1 can poll at 14:00:00. Slave 2
 at 14:01:00. Say there's been a commit at 14:00:30. Requests
 to slave 2 will have a different view of the index than slave 1,
 so if your user resends the exact same request, they may
 see different results. I could submit the request 5 times in a
 row and the results would not only be different each time, they
 would flip-flop back and forth.

 I wouldn't do this unless and until you have a demonstrated need.

 Best
 Erick

 On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll
 lee.a.carr...@googlemail.com wrote:
  Hi Erick,
 
  the load balancer in front of the solr servers is dropping the cookie not
  the solr server themselves.
 
  are you saying the clients http connection manager builds will ignore
 this
  state ? it looks like they do not. It looks like the
  client is passing the cookie back to the load balancer
 
  I want to configure the clients not to pass cookies basically.
 
  Does that make sense ?
 
 
 
  On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com
 wrote:
 
  What client state? Solr servers are stateless, they don't
  keep any information specific to particular clients so this
  doesn't seem to be a problem.
 
  What Solr _does_ do is cache things like fq clauses, but
  these are not user-specific. Which actually argues for going
  to the same slave on the theory that requests from a
  user are more likely to have the same fq clauses. Consider
  faceting on shoes. The user clicks mens and you add an
  fq like fq=gender:mens. Then the user wants dress shoes
  so you submit another query fq=gender:mensfq=style:dress.
  The first fq clause has already been calculated and cached so
  doesn't have to be re-calculated for the second query...
 
  But the stickiness is usually the way Solr is used, so this seems
  like a red herring.
 
  FWIW,
  Erick
 
  On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
  lee.a.carr...@googlemail.com wrote:
   Hi
  
   We have the following solr http server
  
   bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
   id=solrserver 
   constructor-arg value=urlToSlaveLoadBalancer /
   property name=soTimeout value=1000 /
   property name=connectionTimeout value=1000 /
   property name=defaultMaxConnectionsPerHost value=5 /
   property name=maxTotalConnections value=20 /
   property name=allowCompression value=true /
   /bean
  
   The issue we face is the f5 balancer is returning a cookie which the
  client
   is hanging onto. resulting in the same slave being hit for all
 requests.
  
   one obvious solution is to config the load balancer to be non sticky
   however politically a non-standard load balancer is timescale
 suicide.
   (It is an out sourced corporate thing)
  
   I'm not keen to use the LB http solr server as i don't want this to
 be a
   concern of the software and have a list of servers etc. (although as a
  stop
   gap may well have to)
  
   My question is can I configure the solr server to ignore client state
 ?
  We
   are on solr 3.4
  
   Thanks in advance lee c
 



Re: Change config to use port 8080 instead of port 8983

2012-09-27 Thread Sami Siren
i just tried this with tomcat and the props work for me. Did you wipe
out your zoo_data before starting with the additional system
properties?

here's how i ran it:

JAVA_OPTS=-DzkRun -DnumShards=1 -Djetty.port=8080
-Dbootstrap_conf=true -Dhost=127.0.0.1 bin/catalina.sh run

--
 Sami Siren



On Thu, Sep 27, 2012 at 9:47 PM, JesseBuesking jessebuesk...@gmail.com wrote:
 I've set the JAVA_OPTS you mentioned (Djetty.port and Dhost), but zookeeper
 still says that the node runs on port 8983 (clusterstate.json is the same).

 Would you happen to have any other suggestions that I could try?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Change-config-to-use-port-8080-instead-of-port-8983-tp4010663p4010805.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0.snapshot to 4.0.beta index migration

2012-09-27 Thread vybe3142
Thanks, that's what we decided to do too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-0-snapshot-to-4-0-beta-index-migration-tp4009247p4010828.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can SOLR Index UTF-16 Text

2012-09-27 Thread vybe3142
Our SOLR setup  (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8
files. Recently, however, we noticed that it has issues with indexing
certain text files eg. UTF-16 files.  See attachment for an example
(tarred+zipped)

tesla-utf16.txt
http://lucene.472066.n3.nabble.com/file/n4010834/tesla-utf16.txt  

Looking at the text terms, I see 35 terms ie, (1,2,3,...9,0,a,b,c,.z)
!! . A UTF-8 version of this file indexes fine.

Here's what the index analyzer looks like


Are UTF-16 text files supported? Any thoughts ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-SOLR-Index-UTF-16-Text-tp4010834.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Lance Norskog
These are very large files and this is not enough memory. Do you upload these 
as files? 

If the CSV file is one document per line, you can split it up. Unix has a 
'split' command which does this very nicely. 

- Original Message -
| From: Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp
| To: solr-user@lucene.apache.org
| Sent: Thursday, September 27, 2012 2:22:06 AM
| Subject: ExtractingRequestHandler causes Out of Memory Error
| 
| Hi guys,
| 
| 
| I use Manifold CF to crawl files in Windows file server and index
| them to
| Solr using Extracting Request Handler.
| Most of the documents are succesfully indexed but some are failed and
| Out
| of Memory Error occurs in Solr, so I need some advice.
| 
| Those failed files are not so big and they are a csv file of 240MB
| and a
| text file of 170MB.
| 
| Here is environment and machine spec:
| Solr 3.6 (also Solr4.0Beta)
| Tomcat 6.0
| CentOS 5.6
| java version 1.6.0_23
| HDD 60GB
| MEM 2GB
| JVM Heap: -Xmx1024m -Xms1024m
| 
| I feel there is enough memory that Solr should be able to extract and
| index
| file content.
| 
| Here is a Solr log below:
| --
| [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
| Java heap space
| at java.util.Arrays.copyOf(Arrays.java:2882)
| at
| java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
| at
| java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
| at java.lang.StringBuilder.append(StringBuilder.java:189)
| at
| 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
| at
| org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
| at
| org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
| at
| org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
| at
| 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
| at
| org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
| at
| org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
| at
| org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
| at
| org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
| at
| 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
| at
| 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
| at
| 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
| at
| 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
| at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
| at
| 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
| at
| 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
| at
| 
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
| at
| 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
| 
| -
| 
| Anyone has any ideas?
| 
| Regards,
| 
| Shigeki
| 


Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Jan Høydahl
Please try to increase -Xmx and see how much RAM you need for it to succeed.

I believe it is simply a case where this particular file needs double memory 
(480Mb) to parse and you have only allocated 1Gb (which is not particularly 
much). Perhaps the code could be optimized to avoid the Arrays.copyOf() call..

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi 
shigeki.kobayas...@g.softbank.co.jp:

 Hi guys,
 
 
 I use Manifold CF to crawl files in Windows file server and index them to
 Solr using Extracting Request Handler.
 Most of the documents are succesfully indexed but some are failed and Out
 of Memory Error occurs in Solr, so I need some advice.
 
 Those failed files are not so big and they are a csv file of 240MB and a
 text file of 170MB.
 
 Here is environment and machine spec:
 Solr 3.6 (also Solr4.0Beta)
 Tomcat 6.0
 CentOS 5.6
 java version 1.6.0_23
 HDD 60GB
 MEM 2GB
 JVM Heap: -Xmx1024m -Xms1024m
 
 I feel there is enough memory that Solr should be able to extract and index
 file content.
 
 Here is a Solr log below:
 --
 [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
 Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuilder.append(StringBuilder.java:189)
at
 org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 
 -
 
 Anyone has any ideas?
 
 Regards,
 
 Shigeki



Filter query not null or in list

2012-09-27 Thread Kiran J
Hi everyone,

I have a group field which restricts the permission for each user. A user
can belong to multiple groups. A document can belong to only Group (ie) non
multi valued. There are some documents which are unrestricted, hence group
id is null. How can I use the filter for a given user so that it includes
results from both Group=NULL and Group=(X or Y or Z) ? I try something like
this, but doesnt work:

-Group:[* TO *] OR Group:(X OR Y OR Z)

Note that the Group is a UUID field. Is it possible to assign a default
UUID value ?

Any help is much appreciated.

Thanks
Kiran


Re: Filter query not null or in list

2012-09-27 Thread Jack Krupansky

Add a *:* before the negative query.

(*:* -Group:[* TO *]) OR Group:(X OR Y OR Z)

-- Jack Krupansky

-Original Message- 
From: Kiran J 
Sent: Thursday, September 27, 2012 8:07 PM 
To: solr-user@lucene.apache.org 
Subject: Filter query not null or in list 


Hi everyone,

I have a group field which restricts the permission for each user. A user
can belong to multiple groups. A document can belong to only Group (ie) non
multi valued. There are some documents which are unrestricted, hence group
id is null. How can I use the filter for a given user so that it includes
results from both Group=NULL and Group=(X or Y or Z) ? I try something like
this, but doesnt work:

-Group:[* TO *] OR Group:(X OR Y OR Z)

Note that the Group is a UUID field. Is it possible to assign a default
UUID value ?

Any help is much appreciated.

Thanks
Kiran


RE: File content indexing

2012-09-27 Thread Zhang, Lisheng
Hi Erik,

I really meant to send this message earlier, I read code and tested,
your suggestion solved my problem, really appreciate!

Thanks very much for helps, Lisheng

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Tuesday, September 18, 2012 5:04 PM
To: solr-user@lucene.apache.org
Subject: Re: File content indexing


Solr Cell can already do this.  See the stream.file parameter and content steam 
info on the wiki. 

Erik

On Sep 18, 2012, at 19:56, Zhang, Lisheng lisheng.zh...@broadvision.com 
wrote:

 Hi, 
 
 Sorry I just sent out an unfinished message!
 
 Reading Solr cell, we indexing a file by first upload it through HTTP to 
 solr, in my
 experience it is rather expensive to pass a big file through HTTP.
 
 If the file is local, maybe the better way is to pass file path to solr so 
 that solr can
 use java.io API to get file content, maybe this can be much faster?
 
 I am thinking to change solr a little to do, do you think this is a sensible 
 thing to 
 do (I know how to do, but not sure it can improve performance significantly)?
 
 Thanks very much for helps, Lisheng


Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Shigeki Kobayashi
Hi Jan.

Thank you very much for your advice.

So I understand Solr needs more memory to parse the files.
To parse a file of size x,  it needs double memory (2x). Then how much
memory allocation should be taken to heap size? 8x? 16x?

Regards,


Shigeki

2012/9/28 Jan Høydahl jan@cominvent.com

 Please try to increase -Xmx and see how much RAM you need for it to
 succeed.

 I believe it is simply a case where this particular file needs double
 memory (480Mb) to parse and you have only allocated 1Gb (which is not
 particularly much). Perhaps the code could be optimized to avoid the
 Arrays.copyOf() call..

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi 
 shigeki.kobayas...@g.softbank.co.jp:

  Hi guys,
 
 
  I use Manifold CF to crawl files in Windows file server and index them to
  Solr using Extracting Request Handler.
  Most of the documents are succesfully indexed but some are failed and Out
  of Memory Error occurs in Solr, so I need some advice.
 
  Those failed files are not so big and they are a csv file of 240MB and a
  text file of 170MB.
 
  Here is environment and machine spec:
  Solr 3.6 (also Solr4.0Beta)
  Tomcat 6.0
  CentOS 5.6
  java version 1.6.0_23
  HDD 60GB
  MEM 2GB
  JVM Heap: -Xmx1024m -Xms1024m
 
  I feel there is enough memory that Solr should be able to extract and
 index
  file content.
 
  Here is a Solr log below:
  --
 
 [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
  Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at
 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
 at
  java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
 at java.lang.StringBuilder.append(StringBuilder.java:189)
 at
 
 org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
 at
 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
 at
 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
 at
 
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
 at
 
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
 at
 
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
 at
 
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
 at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
 at
  org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at
  org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at
  org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at
 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at
 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 
  -
 
  Anyone has any ideas?
 
  Regards,
 
  Shigeki




Re: Getting the distribution information of scores from query

2012-09-27 Thread Amit Nithian
Thanks! That did the trick! Although it required some more work in the
component level of generating the same query key as the index searcher
else when you go to try and fetch scores for a cached query result, I
got a lot of NPE since the stats are computed in the collector level
which for me isn't set since the cache hit bypasses the lucene level.
I'll write up what I did and probably try and open source the work for
others to see. The stuff with PostFiltering is nice but needs some
examples and documentation.. hopefully mine will help the cause.

Thanks again
Amit

On Wed, Sep 26, 2012 at 5:13 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 I suggest to create a component, put it after QueryComponent. in prepare it
 should add own PostFilter into list of request filters, your post filter
 will be able to inject own DelegatingCollector, then you can just add
 collected histogram into result named list
  http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/

 On Tue, Sep 25, 2012 at 10:03 PM, Amit Nithian anith...@gmail.com wrote:

 We have a federated search product that issues multiple parallel
 queries to solr cores and fetches the results and blends them. The
 approach we were investigating was taking the scores, normalizing them
 based on some distribution (normal distribution seems reasonable) and
 use that z score as the way to blend the results (else you'll be
 blending scores on different scales). To accomplish this, I was
 looking to get the distribution of the scores for the query as an
 analog to the stats component but seem to see the only way to
 accomplish this would be to create a custom collector that would
 accumulate and store this information (mean, std-dev etc) since the
 stats component only operates on indexed fields.

 Is there an easy way to tell Solr to use a custom collector without
 having to modify the SolrIndexSearcher class? Maybe is there an
 alternative way to get this information?

 Thanks
 Amit




 --
 Sincerely yours
 Mikhail Khludnev
 Tech Lead
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com