Re: Parallel SQL - column not found intermittent error

2017-06-14 Thread Yury Kats
I have seen this with very few indexed documents and multiple shards.
In such a case, some shards may not have any documents, and when the query
happens to hit such a shard, it does not find the fields it's looking for
and turns this into "column not found". If you resubmit the query and hit
a different shards (with docs), the query will succeed.

On 6/14/2017 11:42 AM, Susheel Kumar wrote:
> Yes, Joel.  Kind of every other command runs into this issue. I just
> executed below queries and 3 of them failed while 1 succeeded.   I just
> have 6 documents ingested and no further indexing going on.  Let me know
> what else to look for the state of index.
> 
> 
> ➜  solr-6.6.0 curl --data-urlencode 'stmt=SELECT  sr_sv_userFirstName as
> firstName, sr_sv_userLastName as lastName FROM collection1 ORDEr BY
> dv_sv_userLastName LIMIT 15'
> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet
> 
> 
> {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT
>  sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM
> collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection
> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT
>  sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM
> collection1 ORDEr BY dv_sv_userLastName LIMIT 15\": From line 1, column 9
> to line 1, column 27: Column 'sr_sv_userFirstName' not found in any
> table","EOF":true,"RESPONSE_TIME":85}]}}
> 
> 
> ➜  solr-6.6.0 curl --data-urlencode 'stmt=SELECT  sr_sv_userFirstName as
> firstName, sr_sv_userLastName as lastName FROM collection1 ORDEr BY
> dv_sv_userLastName LIMIT 15'
> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet
> 
> 
> {"result-set":{"docs":[{"firstName":"Thiago","lastName":"Diego"},{"firstName":"John","lastName":"Jagger"},{"firstName":"John","lastName":"Jagger"},{"firstName":"John","lastName":"Johny"},{"firstName":"Isabel","lastName":"Margret"},{"firstName":"Isabel","lastName":"Margret"},{"EOF":true,"RESPONSE_TIME":241}]}}
> 
> 
> ➜  solr-6.6.0 curl --data-urlencode 'stmt=SELECT  sr_sv_userFirstName as
> firstName, sr_sv_userLastName as lastName FROM collection1 ORDEr BY
> dv_sv_userLastName LIMIT 15'
> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet
> 
> 
> 
> {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT
>  sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM
> collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection
> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT
>  sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM
> collection1 ORDEr BY dv_sv_userLastName LIMIT 15\": From line 1, column 9
> to line 1, column 27: Column 'sr_sv_userFirstName' not found in any
> table","EOF":true,"RESPONSE_TIME":87}]}}
> 
> On Wed, Jun 14, 2017 at 11:18 AM, Joel Bernstein  wrote:
> 
>> Are you able to reproduce the error, or is it just appearing in the logs?
>>
>> Do you know the state of index when it's occurring?
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, Jun 14, 2017 at 11:09 AM, Susheel Kumar 
>> wrote:
>>
>>> I have setup Solr-6.6-0 on local (local ZK and Solr) and then on servers
>> (3
>>> ZK and 2 machines, 2 shards) and on both the env,  i see this
>> intermittent
>>> error "column not found". The same query works sometime and other time
>>> fails.
>>>
>>> Is that a bug or am I missing something...
>>>
>>>
>>> Console
>>> ===
>>>
>>> -> solr-6.6.0 curl --data-urlencode 'stmt=SELECT  dv_sv_userFirstName as
>>> firstName, dv_sv_userLastName as lastName FROM collection1 ORDEr BY
>>> dv_sv_userLastName LIMIT 15'
>>> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet
>>>
>>> {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT
>>>  dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM
>>> collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection
>>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT
>>>  dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM
>>> collection1 ORDEr BY dv_sv_userLastName LIMIT 15\": From line 1, column 9
>>> to line 1, column 27: Column 'dv_sv_userFirstName' not found in any
>>> table","EOF":true,"RESPONSE_TIME":78}]}}
>>>
>>> ➜  solr-6.6.0 curl --data-urlencode 'stmt=SELECT  dv_sv_userFirstName as
>>> firstName, dv_sv_userLastName as lastName FROM collection1 ORDEr BY
>>> dv_sv_userLastName LIMIT 15'
>>> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet
>>>
>>> {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT
>>>  dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM
>>> collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection
>>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT
>>>  dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM
>>> collection1 ORDEr BY 

Re: index multiple files into one index entity

2013-05-27 Thread Yury Kats
No, the implementation was very specific to my needs.

On 5/27/2013 8:28 AM, Alexandre Rafalovitch wrote:
 You did not open source it by any chance? :-)
 
 Regards,
Alex.



Re: CoreAdmin STATUS performance

2013-01-09 Thread Yury Kats
On 1/9/2013 10:38 AM, Shahar Davidson wrote:
 Hi All,
 
 I have a client app that uses SolrJ and which requires to collect the names 
 (and just the names) of all loaded cores.
 I have about 380 Solr Cores on a single Solr server (net indices size is 
 about 220GB).
 
 Running the STATUS action takes about 800ms - that seems a bit too long, 
 given my requirements.
 
 So here are my questions:
 1) Is there any way to get _only_ the core Name of all cores?

If you have access to the filesystem, you could just read solr.xml where all 
cores are listed.


Re: SolrJ | Add a date field to ContentStreamUpdateRequest

2012-12-30 Thread Yury Kats
On 12/30/2012 11:57 AM, uwe72 wrote:
 Hi there,
 
 how can i add a date field to a pdf document?

Same way you add the ID field, using literal parameter.

 
ContentStreamUpdateRequest up = new
 ContentStreamUpdateRequest(/update/extract);
up.addFile(pdfFile, application/octet-stream);
up.setParam(literal. + SolrConstants.ID, solrPDFId);
 
 Regards
 Uwe
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrJ-Add-a-date-field-to-ContentStreamUpdateRequest-tp4029704.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: SolrJ | Add a date field to ContentStreamUpdateRequest

2012-12-30 Thread Yury Kats
On 12/30/2012 3:55 PM, uwe72 wrote:
 but i can just add String values.i want to add Date objects?!

You represent the Date as a String, in format Solr uses for dates:
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/schema/DateField.html



Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2012-10-18 Thread Yury Kats
I'm pretty sure this problem has been there forever -- the parsing of zkHost is 
busted. I believe it's only been intended for example/demo purposes and 
therefore makes some assumptions about the value.
I haven't looked at the current code, but this is my recollection from about a 
year ago.





 From: Pascal freqresp...@pensa.fr
To: solr-user@lucene.apache.org 
Sent: Thursday, October 18, 2012 5:45 AM
Subject: solr4.0 problem zkHost with multiple hosts throws out of range 
exception
 
Hi there,

I've set up a test solr 4.0 cloud with some nodes, everything worked fine
until i tried to put more than 1 zookeeper instance.

If i put only one server it's ok eg: java -DzkHost=10.0.0.1:9983 -DzkRun
-jar start.jar

But if i put more than 1 server in zkHost param an Exception is thrown
immediately when parsing the zkHost parameter:

Exemple : java -DzkHost=10.0.0.1:9983,10.0.0.2:9983 -DzkRun -jar start.jar

[...]
SEVERE: null:java.lang.IllegalArgumentException: port out of range:-1
        at java.net.InetSocketAddress.init(InetSocketAddress.java:83)
        at java.net.InetSocketAddress.init(InetSocketAddress.java:63)
        at
org.apache.solr.cloud.SolrZkServerProps.setClientPort(SolrZkServer.java:315)
        at
org.apache.solr.cloud.SolrZkServerProps.getMySeverId(SolrZkServer.java:278)
        at
org.apache.solr.cloud.SolrZkServerProps.parseProperties(SolrZkServer.java:453)
        at
org.apache.solr.cloud.SolrZkServer.parseConfig(SolrZkServer.java:90)
        at
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:208)
[...]

The  port out of range:-1 look like if zkHost parameter wasn't correctly
split as soon as i add a coma in the parameter.

I tried to put hostnames instead of ip with no luck.

I tried to search in this forum and on the net but didn't find why, any idea
? 

Thanks,
Pascal



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-14 Thread Yury Kats
You can merge indexes. You cannot split them. 

jefferyyuan yuanyun...@gmail.com wrote:

Thanks for the reply, but I think SolrReplication may not help in this case,
as we don't want to replicate all indexs to solr2, just a part of
index(index of doc created by me). Seems SolrReplication doesn't support
replicate a part of index(based on a query) to the slave.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479p4013580.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Extract multiple streams into the same document

2012-10-09 Thread Yury Kats
Answering my own question, for archive's sake,
I worked this out by creating my own UpdateRequestProcessor.

On 10/4/2012 2:35 PM, Yury Kats wrote:
 I'm sending streams of data to Solr, using ExtractingRequestHandler to be 
 parsed/extracted by Tika and then indexed.
 
 While multiple streams can be passed with a single request to Solr, each 
 stream ends up being indexed into a separate document.
 Or, if I pass the unique id parameter with the request (as literal.id 
 parameter), the very last stream ends up overwriting all
 other streams withing the same request, since each one is being indexed into 
 a new document with the same id.
 
 I'm looking for a way to have multiple streams indexed into the same 
 document. I have a content field defined for extraction
 (using fmap.content parameter) and the field is defined as multiValued in the 
 schema. I would like all streams from the request to be
 indexed as different values of that multiValued content field in the same 
 document.
 
 Any hints or ideas are appreciated.
 
 Thanks,
 Yury
 



Extract multiple streams into the same document

2012-10-04 Thread Yury Kats
I'm sending streams of data to Solr, using ExtractingRequestHandler to be 
parsed/extracted by Tika and then indexed.

While multiple streams can be passed with a single request to Solr, each stream 
ends up being indexed into a separate document.
Or, if I pass the unique id parameter with the request (as literal.id 
parameter), the very last stream ends up overwriting all
other streams withing the same request, since each one is being indexed into a 
new document with the same id.

I'm looking for a way to have multiple streams indexed into the same document. 
I have a content field defined for extraction
(using fmap.content parameter) and the field is defined as multiValued in the 
schema. I would like all streams from the request to be
indexed as different values of that multiValued content field in the same 
document.

Any hints or ideas are appreciated.

Thanks,
Yury


Re: missing core name in path

2012-08-16 Thread Yury Kats
On 8/16/2012 6:57 AM, Muzaffer Tolga Özses wrote:
 
 Also, below are the lines I got when starting it:
 
 SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: 
 multiple points
 ...
 Caused by: java.lang.NumberFormatException: multiple points
  at 
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082)

This looks like the version number at the top of the schema has more than one 
dot,
eg 1.2.3. Solr parses version as a floating point number, so it must be 
1.23 instead.



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Yury Kats
On 7/18/2012 7:11 PM, Briggs Thompson wrote:
 I have realized this is not specific to SolrJ but to my instance of Solr. 
 Using curl to delete by query is not working either. 

Can be this: https://issues.apache.org/jira/browse/SOLR-3432


Re: Could I use Solr to index multiple applications?

2012-07-17 Thread Yury Kats
On 7/17/2012 9:26 PM, Zhang, Lisheng wrote:
 Thanks very much for quick help! Multicore sounds interesting,
 I roughly read the doc, so we need to put each core name into
 Solr config XML, if we add another core and change XML, do we
 need to restart Solr?

You can add/create cores on the fly, without restarting.
See http://wiki.apache.org/solr/CoreAdmin#CREATE


multiValued false-true

2012-07-12 Thread Yury Kats
I have an indexed, not stored, not multiValued field in the schema.

If I change this field to be multiValued, would I need to re-index
everything, or would all existing documents (that were indexed while
the field was not multiValued) still be queryable?

Thanks,
Yury


Re: Sort by date field = outofmemory?

2012-07-11 Thread Yury Kats
This solves the problem by allocating memory up front, instead of at some
point later when JVM needs it. At that later point in time there may not
be enough free memory left on the system to allocate.

On 7/11/2012 11:04 AM, Michael Della Bitta wrote:
 There is a school of thought that suggests you should always set Xms
 and Xmx to the same thing if you expect your heap to hit Xms. This
 results in your process only needing to allocate the memory once,
 rather in a series of little allocations as the heap expands.
 
 I can't explain how this fixed your problem, but just a datapoint that
 might suggest that doing what you did is not such a bad thing.
 
 Michael Della Bitta
 
 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com
 
 
 On Wed, Jul 11, 2012 at 4:05 AM, Bruno Mannina bmann...@free.fr wrote:
 Hi, some news this morning...

 I added -Xms1024m option and now it works?! no outofmemory ?!

 java -jar -Xms1024m -Xmx2048m start.jar

 Le 11/07/2012 09:55, Bruno Mannina a écrit :

 Hi Yury,

 Thanks for your anwer.

 ok for to increase memory but I have a problem with that,
 I have 8Go on my computer but the JVM accepts only 2Go max with the option
 -Xmx
 is it normal?

 Thanks,
 Bruno

 Le 11/07/2012 03:42, Yury Kats a écrit :

 Sorting is a memory-intensive operation indeed.
 Not sure what you are asking, but it may very well be that your
 only option is to give JVM more memory.

 On 7/10/2012 8:25 AM, Bruno Mannina wrote:

 Dear Solr Users,

 Each time I try to do a request with sort=pubdate+desc

 I get:
 GRAVE: java.lang.OutOfMemoryError: Java heap space

 I use Solr3.6, I have around 80M docs and my request gets around 160
 results.

 Actually for my test, i use jetty

 java -jar -Xmx2g start.jar

 PS: If I write 3g i get an error, I have 8go Ram

 Thanks a lot for your help,
 Bruno











 




Re: query syntax to find ??? chars

2012-07-11 Thread Yury Kats
On 7/11/2012 2:55 PM, Alexander Aristov wrote:

 content:?? doesn't work :)

I would try escaping them: content:\?\?\?\?\?\?





Re: Sort by date field = outofmemory?

2012-07-10 Thread Yury Kats
Sorting is a memory-intensive operation indeed.
Not sure what you are asking, but it may very well be that your
only option is to give JVM more memory.

On 7/10/2012 8:25 AM, Bruno Mannina wrote:
 Dear Solr Users,
 
 Each time I try to do a request with sort=pubdate+desc
 
 I get:
 GRAVE: java.lang.OutOfMemoryError: Java heap space
 
 I use Solr3.6, I have around 80M docs and my request gets around 160 
 results.
 
 Actually for my test, i use jetty
 
 java -jar -Xmx2g start.jar
 
 PS: If I write 3g i get an error, I have 8go Ram
 
 Thanks a lot for your help,
 Bruno
 
 




Re: get number of cores

2012-06-25 Thread Yury Kats
On 6/25/2012 8:40 AM, Yuval Dotan wrote:
 Hi
 Is there a  *programmatic (java) *way to connect to the Solr server (using
 solrj probably) and get the number of cores and core names?

A STATUS admin request will give you all available cores, with their names.
http://wiki.apache.org/solr/CoreAdmin#STATUS


Re: Solr v3.5.0 - numFound changes when paging through results on 8-shard cluster

2012-06-19 Thread Yury Kats
On 6/19/2012 4:06 PM, Justin Babuscio wrote:
 Solr v3.5.0
 8 Master Shards
 2 Slaves Per Master
 
 Confirming that there are no active records being written, the numFound
 value is decreasing as we page through the results.
 
 For example,
 Page1 - numFound = 3683
 Page2 - numFound = 3683
 Page3 - numFound = 3683
 Page4 - numFound = 2866
 Page5 - numFound = 2419
 Page5 - numFound = 1898
 Page6 - numFound = 1898
 ...
 PageN - numFound = 1898
 
 
 
 It looks like it eventually settles on the real count.  Is this a
 limitation when using a distributed cluster or is the numFound always
 intended to give an approximately similar to how Google responds with total
 hits?

numFound should return the real count for any given query.
How are you sepcifying which shards/cores to use for each query?
Does this change between queries?



Re: SolrCloud and split-brain

2012-06-15 Thread Yury Kats
On 6/15/2012 12:49 PM, Otis Gospodnetic wrote:
 Hi,
 
 How exactly does SolrCloud handle split brain situations?
 
 Imagine a cluster of 10 nodes.
 Imagine 3 of them being connected to the network by some switch and imagine 
 the out port of this switch dies.
 When that happens, these 3 nodes will be disconnected from the other 7 nodes 
 and we'll have 2 clusters, one with 3 nodes and one with 7 nodes and we'll 
 have a split brain situation.  
 Imagine we had 3 ZK nodes in the original 10-node cluster, 2 of which are 
 connected to the dead switch and are thus aware only of the 3 node cluster 
 now, and 1 ZK instance which is on a different switch and is thus aware only 
 of the 7 node cluster.
 
 At this point how exactly does ZK make SolrCloud immune to split brain?

A quorum of N/2+1 nodes is required to operate (that's also the reason you need 
at least 3 to begin with)


Re: LockObtainFailedException after trying to create cores on second SolrCloud instance

2012-06-14 Thread Yury Kats
On 6/14/2012 2:05 AM, Daniel Brügge wrote:
 Will check later to use different data dirs for the core on
 each instance.
 But because each Solr sits in it's own openvz instance (virtual
 server respectively) they should be totally separated. At least
 from my point of understanding virtualization.

Depending on how your VMs are configured, their filesystems could
be mapped to the same place of the host's filesystem. What you describe
sounds like this is the case.


Re: copyField

2012-05-18 Thread Yury Kats
On 5/18/2012 9:54 AM, Tolga wrote:
 Hi,
 
 I've put the line copyField=* dest=text stored=true 
 indexed=true/ in my schema.xml and restarted Solr, crawled my 
 website, and indexed (I've also committed but do I really have to 
 commit?). But I still have to search with content:mykeyword at the admin 
 interface. What do I have to do so that I can search only with mykeyword?

Do you have the default field defined?



Re: copyField

2012-05-18 Thread Yury Kats
On 5/18/2012 4:02 PM, Tolga wrote:
 Default field? I'm not sure but I think I do. Will have to look. 

http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field


Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
I'm running into a problem with queries that contain forward slashes and more 
than one field.

For example, these queries work fine:
fieldName:/a
fieldName:/*

But if I have two fields with similar syntax in the same query, it fails.

For simplicity, I'm using the same field twice:

fieldName:/a fieldName:/a

results in: no field name specified in query and no defaultSearchField defined 
in schema.xml

SEVERE: org.apache.solr.common.SolrException: no field name specified in query 
and no defaultSearchField defined in schema.xml
at 
org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:106)
at 
org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:124)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1058)
at 
org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
at 
org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
at 
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
at 
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)


fieldName:/* fieldName:/*

results in: null

java.lang.NullPointerException
at 
org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:747)
at 
org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1026)
at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:980)
at 
org.apache.solr.search.SolrQueryParser.getWildcardQuery(SolrQueryParser.java:172)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1039)
at 
org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
at 
org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
at 
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
at 
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)


Any ideas as to what may be wrong and how can I make these work?

I'm on a 4.0 snapshot from Nov 29, 2011.


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 12:25 PM, Yury Kats wrote:
 I'm running into a problem with queries that contain forward slashes and more 
 than one field.
 
 For example, these queries work fine:
 fieldName:/a
 fieldName:/*
 
 But if I have two fields with similar syntax in the same query, it fails.
 
 For simplicity, I'm using the same field twice:
 
 fieldName:/a fieldName:/a

Looks like escaping forward slashes makes the query work, eg
  fieldName:\/a fieldName:\/a

This is a bit puzzling as the forward slash is not part of the query language, 
is it?


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:05 PM, Em wrote:
 Yury,
 
 are you sure your request has a proper url-encoding?

Yes


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:25 PM, Em wrote:
 That's strange.
 
 Could you provide a sample dataset?

Data set does not matter. The query fails to parse, long before it gets to the 
data.


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:24 PM, Yonik Seeley wrote:

 This is a bit puzzling as the forward slash is not part of the query 
 language, is it?
 
 Regex queries were added that use forward slashes:
 
 https://issues.apache.org/jira/browse/LUCENE-2604

Oh, so / is a special character now? I don't think it is mentioned as such on 
any of the wiki pages,
or in org.apache.solr.client.solrj.util.ClientUtils


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:24 PM, Yonik Seeley wrote:
 Looks like escaping forward slashes makes the query work, eg
  fieldName:\/a fieldName:\/a

 This is a bit puzzling as the forward slash is not part of the query 
 language, is it?
 
 Regex queries were added that use forward slashes:
 
 https://issues.apache.org/jira/browse/LUCENE-2604

Looks like regex matching happens across multiple fields though. Feels like a 
bug to me?


Re: no such core error with EmbeddedSolrServer

2012-01-06 Thread Yury Kats
On 1/6/2012 9:57 AM, Phillip Rhodes wrote:
 On Fri, Jan 6, 2012 at 3:06 AM, Sven Maurmann s...@kippdata.de wrote:
 Hi,

 from your snippets the reason is not completely clear. There are a number of 
 reasons for not starting up the
 server. For example in case of a faulty configuration of the core 
 (solrconfig.xml, schema.xml) the core does
 not start and you get the reported error.
 
 Yeah, that I noticed... I had some such errors earlier, that I noticed
 when starting the Solr / Jetty standalone instance, but those have
 been resolved, and now I can launch Solr as a process, and use the
 SolrJ implementation that talks http to it - from my program - and
 everything works as expected.  But still no joy with the
 EmbeddedSolrServer.  :-(

Have you tried passing core name (collection1) to the c'tor, instead
of the empty string?


Re: no such core error with EmbeddedSolrServer

2012-01-06 Thread Yury Kats
On 1/6/2012 10:19 AM, Phillip Rhodes wrote:
 2012/1/6 Yury Kats yuryk...@yahoo.com:

 Have you tried passing core name (collection1) to the c'tor, instead
 of the empty string?
 
 Yep, but that gives the same error (with the core name appended) such
 as no such core: collection1

That probably means the home is not set properly, so it can't find solr.xml



Re: Replication not working

2011-12-22 Thread Yury Kats
On 12/22/2011 4:39 AM, Dean Pullen wrote:
 Yeh the drop index via the URL command doesn't help anyway - when rebuilding 
 the index the timestamp is obviously ahead of master (as the slave is being 
 created now) so the replication will still not happen. 

If you deleted the index and create the core anew, index version will be 0 and 
replication will work.


Core overhead

2011-12-15 Thread Yury Kats
Does anybody have an idea, or better yet, measured data,
to see what the overhead of a core is, both in memory and speed?

For example, what would be the difference between having 1 core
with 100M documents versus having 10 cores with 10M documents?


Re: Core overhead

2011-12-15 Thread Yury Kats
On 12/15/2011 1:07 PM, Robert Stewart wrote:

 I think overall memory usage would be close to the same.

Is this really so? I suspect that the consumed memory is in direct
proportion to the number of terms in the index. I also suspect that
if I divided 1 core with N terms into 10 smaller cores, each smaller
core would have much more than N/10 terms. Let's say I'm indexing
English texts, it's likely that all smaller cores would have almost
the same number of terms, close to the original N. Not so?


Re: Core overhead

2011-12-15 Thread Yury Kats
On 12/15/2011 1:41 PM, Robert Petersen wrote:
 loading.  Try it out, but make sure that the functionality you are
 actually looking for isn't sharding instead of multiple cores...  

Yes, but the way to achieve sharding is to have multiple cores.
The question is then becomes -- how many cores (shards)?


Re: Core overhead

2011-12-15 Thread Yury Kats
On 12/15/2011 4:46 PM, Robert Petersen wrote:
 Sure that is possible, but doesn't that defeat the purpose of sharding?
 Why distribute across one machine?  Just keep all in one index in that
 case is my thought there...

To be able to scale w/o re-indexing. Also often referred to as micro-sharding.


Re: Virtual Memory very high

2011-12-13 Thread Yury Kats
On 12/13/2011 6:16 AM, Dmitry Kan wrote:
 If you allow me to chime in, is there a way to check for which
 DirectoryFactory is in use, if
 ${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured?

I think you can get the currently used factory in a Luke response, if you hit 
your Solr server with a Luke request,
eg http://localhost:8983/solr/admin/luke


 
 Dmitry
 
 2011/12/12 Yury Kats yuryk...@yahoo.com
 
 On 12/11/2011 4:57 AM, Rohit wrote:
 What are the difference in the different DirectoryFactory?


 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html

 http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html

 



Re: Virtual Memory very high

2011-12-12 Thread Yury Kats
On 12/11/2011 4:57 AM, Rohit wrote:
 What are the difference in the different DirectoryFactory?

http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html
http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html


Re: Virtual Memory very high

2011-12-10 Thread Yury Kats
On 12/9/2011 11:54 PM, Rohit wrote:
 Hi All,
 
  
 
 Don't know if this question is directly related to this forum, I am running
 Solr in Tomcat on linux server. The moment I start tomcat the virtual memory
 shown using TOP command goes to its max 31.1G and then remains there.
 
  
 
 Is this the right behaviour, why is the virtual memory usage so high. I have
 36GB of ram on the server.

To limit VIRT memory, change DirectoryFactory in the solrconfig.xml to use
solr.NIOFSDirectoryFactory.


Re: Delete by Query with limited number of rows

2011-11-12 Thread Yury Kats
On 11/12/2011 4:08 PM, mikr00 wrote:
 Similar to a first in first out list. The problem is: It's easy to check the
 limit, but how can I delete the oldest documents to go again below the
 limit? Can I do it with a delete by query request? In that case, I would
 probably have to limit the number of rows? But I can't seem to find a way to
 do that. Or would you see a different solution (maybe there is a way to
 configure the solr core such that it automatically behaves as desribed?)?

You can certainly delete a set of documents using delete by query,
but you need to somehow identify what documents you want to have deleted.
For that, you'd need to have a field, such as a sequence number or a timestamp
when the document was added.

Alternatively, if you can control the uniqueKey field when adding documents,
you can just cycle it between 1 and 1,000,000. When you reach 1,000,000
set the uniqueKey back to 1 and keep adding. The new document will automatically
replace the old document with the key of 1.


Re: Default value for dynamic fields

2011-11-03 Thread Yury Kats
On 11/3/2011 12:59 PM, Milan Dobrota wrote:
 Is there any way to define the default value for the dynamic fields in
 SOLR? I use some dynamic fields of type float with _val_ and if they
 haven't been created at index time, the value defaults to 0. I would want
 this to be 1. Can that be changed?

Does specifying default=1 not work?



Re: shard indexing

2011-11-02 Thread Yury Kats
There's a defaultCore parameter in solr.xml that let's you specify what core 
should be used when none is specified in the URL. You can change that every 
time you create a new core.




From: Vadim Kisselmann v.kisselm...@googlemail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, November 2, 2011 6:16 AM
Subject: Re: shard indexing

Hello Jan,

thanks for your quick response.

It's quite difficult to explain:
We want to create new shards on the fly every month and switch the default
shard to the newest one.
We always want to index to the newest shard with the same update query
like  http://localhost:8983/solr/update.(content stream)

Is our idea possible to implement?

Thanks in advance.
Regards

Vadim





2011/11/2 Jan Høydahl jan@cominvent.com

 Hi,

 The only difference is the core name in the URL, which should be easy
 enough to handle from your indexing client code. I don't really understand
 the reason behind your request. How would you control which core to index
 your document to if you did not specify it in the URL?

 You could name ONE of your cores as ., meaning it would be the default
 core living at /solr/update, perhaps that is what you're looking for?

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:

  Hello folks,
  i have an problem with shard indexing.
 
  with an single core i use this update command:
  http://localhost:8983/solr/update .
 
  now i have 2 shards, we can call them core0 / core1
  http://localhost:8983/solr/core0/update .
 
 
  can i adjust anything to indexing in the same way like with a single core
  without core-name?
 
  thanks and regards
  vadim






Re: Solr Replication: relative path in confFiles Element?

2011-10-25 Thread Yury Kats
On 10/25/2011 11:24 AM, Mark Schoy wrote:
 Hi,
 
 is ist possible to define a relative path in confFile?
 
 For example:
 
 str name=confFiles../../x.xml/str
 
 If yes, to which location will the file be copied at the slave?

I don;t think it's possible. Replication copies confFiles from master core's
confDir to slave core's confDir.



Re: Merging Remote Solr Indexes?

2011-10-20 Thread Yury Kats
On 10/19/2011 5:15 PM, Darren Govoni wrote:
 Hi Otis,
 Yeah, I saw page, but it says for merging cores, which I presume 
 must reside locally to the solr instance doing the merging?
 What I'm interested in doing is merging across solr instances running on 
 different machines into a single solr running on
 another machine (programmatically). Is it still possible or did I 
 misread the wiki?

Possible, but in a few steps.
1. Create new cores on another machine.
2. Replicate them from different machine.
3. Merge on another machine.

All 3 steps can be done programmatically.


Re: Issue with Shard configuration in solrconfig.xml (Solr 3.1)

2011-10-20 Thread Yury Kats
On 10/20/2011 11:33 AM, Rahul Warawdekar wrote:
 Hi,
 
 I am trying to evaluate distributed search for my project by splitting up
 our single index on 2 shards with Solr 3.1
 When I query the first solr server by passing the shards parameter, I get
 correct search results from both shards.
 (
 http://server1:8080/solr/test/select/?shards=server1:8080/solr/test,server2:8080/solr/testq=solrstart=0rows=20
 )
 
 I want to avoid the use of this shards parameter in the http url and specify
 it in solrconfig.xml as follows.
 
 requestHandler name=my_custom_handler class=solr.SearchHandler
 default=true
 str name=shardsserver1:8080/solr/test,server2:8080/solr/test/str
 ..
 /requestHandler

Don't you need to wrap it in lst name=default or lst name=appends?

 After adding the shards parameter in solrconfig.xml, I get search results
 only from the first shard and not from the from the second one.
 Am I missing any configuration ?

This means your 'shards' parameter is not being used, because it's not 
specified properly.

 Also, can the urls with the shard parameter be load balanced for a failover
 mechanism ?

See SolrCloud http://wiki.apache.org/solr/SolrCloud


Re: SolrJ + Post

2011-10-14 Thread Yury Kats
On 10/14/2011 9:29 AM, Rohit wrote:
 I want to user POST instead of GET while using solrj, but I am unable to
 find a clear example for it. If anyone has implemented the same it would be
 nice to get some insight.

To do what? Submit? Query? How do you use SolrJ now?


Re: SolrJ + Post

2011-10-14 Thread Yury Kats
On 10/14/2011 12:11 PM, Rohit wrote:
 I want to query, right now I use it in the following way,
 
 CommonsHttpSolrServer server = new CommonsHttpSolrServer(URL HERE);
 SolrQuery sq = new SolrQuery();
 sq.add(q,query);
 QueryResponse qr = server.query(sq);

QueryResponse qr = server.query(sq, METHOD.POST);


Re: basic solr cloud questions

2011-09-30 Thread Yury Kats
On 9/30/2011 12:26 PM, Pulkit Singhal wrote:
 SOLR-2355 is definitely a step in the right direction but something I
 would like to get clarified:

Questions about SOLR-2355 are best asked in SOLR-2355 :)
 b) Does this basic implementation distribute across shards or across
 cores? 

From a brief look, it seems to assume shard=core. You list
all cores in the config file under shards.



Re: SolrCloud: is there a programmatic way to create an ensemble

2011-09-29 Thread Yury Kats
Nope

On 9/29/2011 12:17 AM, Pulkit Singhal wrote:
 Did you find out about this?
 
 2011/8/2 Yury Kats yuryk...@yahoo.com:
 I have multiple SolrCloud instances, each running its own Zookeeper
 (Solr launched with -DzkRun).

 I would like to create an ensemble out of them. I know about -DzkHost
 parameter, but can I achieve the same programmatically? Either with
 SolrJ or REST API?

 Thanks,
 Yury

 



Re: basic solr cloud questions

2011-09-29 Thread Yury Kats
On 9/29/2011 7:22 AM, Darren Govoni wrote:
 That was kinda my point. The new cloud implementation
 is not about replication, nor should it be. But rather about
 horizontal scalability where nodes manage different parts
 of a unified index. 

It;s about many things. You stated one, but there are goals,
one of them being tolerance to node outages. In a cloud, when
one of your many nodes fail, you don't want to stop querying and
indexing. For this to happen, you need to maintain redundant copies
of the same pieces of the index, hence you need to replicate.

 One of the design goals of the new cloud
 implementation is for this to happen more or less automatically.

True, but there is a big gap between goals and current state.
Right now, there is distributed search, but not distributed indexing
or auto-sharding, or auto-replication. So if you want to use the SolrCloud
now (as many of us do), you need do a number of things yourself,
even if they might be done by SolrCloud automatically in the future.

 To me that means one does not have to manually distributed
 documents or enforce replication as Yurly suggests.
 Replication is different to me than what was being asked.
 And perhaps I misunderstood the original question.
 
 Yurly's response introduced the term core where the original
 person was referring to nodes. For all I know, those are two
 different things in the new cloud design terminology (I believe they are).
 
 I guess understanding cores vs. nodes vs shards is helpful. :)

Shard is a slice of index. Index is managed/stored in a core.
Nodes are Solr instances, usually physical machines.

Each node can host multiple shards, and each shard can consist of multiple 
cores.
However, all cores within the same shard must have the same content.

This is where the OP ran into the problem. The OP had 1 shard, consisting of two
cores on two nodes. Since there is no distributed indexing yet, all documents 
were
indexed into a single core. However, there is distributed search, therefore 
queries
were sent randomly to different cores of the same shard. Since one core in the 
shard
had documents and the other didn't, the query result was random.

To solve this problem, the OP must make sure all cores within the same shard 
(be they
on the same node or not) have the same content. This can currently be achieved 
by:
a) setting up replication between cores. you index into one core and the other 
core
replicates the content
b) indexing into both cores

Hope this clarifies.


Re: basic solr cloud questions

2011-09-27 Thread Yury Kats
On 9/27/2011 5:16 PM, Darren Govoni wrote:
 On 09/27/2011 05:05 PM, Yury Kats wrote:
 You need to either submit the docs to both nodes, or have a replication
 setup between the two. Otherwise they are not in sync.
 I hope that's not the case. :/ My understanding (or hope maybe) is that 
 the new Solr Cloud implementation will support auto-sharding and 
 distributed indexing. This means that shards will receive different 
 documents regardless of which node received the submitted document 
 (spread evenly based on a hash-node assignment). Distributed queries 
 will thus merge all the solr shard/node responses.

All cores in the same shard must somehow have the same index.
Only then can you continue servicing searches when individual cores
fail. Auto-sharding and distributed indexing don't have anything to
do with this.

In the future, SolrCloud may be managing replication between cores
in the same shard automatically. But right now it does not.


Re: two cores but have single result set in solr

2011-09-24 Thread Yury Kats
On 9/24/2011 3:09 AM, hadi wrote:
 I do not know how to search both cores and not define shard
 parameter,could you show me some solutions for solve my issue?

See this: http://wiki.apache.org/solr/DistributedSearch


Re: two cores but have single result set in solr

2011-09-23 Thread Yury Kats
On 9/23/2011 6:00 PM, hadi wrote:
 I index my files with solrj and crawl my sites with nutch 1.3 ,as you
 know, i have to overwrite the nutch schema on solr schema in order to
 have view the result in solr/browse, in this case i should define two
 cores,but i want have single result or the user can search into both
 core indexes at the same time

Can you not use 'shard' parameter and specify both cores there?



How to check if replication is running

2011-09-16 Thread Yury Kats
Let's say I'm forcing a replication of a core using fetchindex command.
No new content is being added to the master.

I can check whether replication has finished by periodically querying
master and slave for their indexversion and comparing the two.

But what's the best way to check if replication is actually happening
and hasn't been dropped, if for example, there was a network outage
between master and the slave, in which case, I want to re-start
replication.

Thanks,
Yury



Re: How to check if replication is running

2011-09-16 Thread Yury Kats
On 9/16/2011 4:58 PM, Brandon Fish wrote:
 Hi Yury,
 
 You could try checking out the details command of the replication handler:
 http://slave_host:port/solr/replication?command=details
 which has information such as isReplicating.

How reliable is isReplicating? Is it updated on unexpected failures or only
during nomral operation? Eg, if both servers were powered down and then up,
would it be false?

 You could also look at the script attached to [1] which shows a
 thorough check of a slaves replication status which could be polled for to
 trigger a restart if there is an error.
 [1] https://issues.apache.org/jira/browse/SOLR-1855

Thanks, that's very helpful. I see that it ultimately checks for 2 hour 
threshhold,
which implies that other means of checking may not be 100% reliable. Is that so?


Re: Can index size increase when no updates/optimizes are happening?

2011-09-15 Thread Yury Kats
On 9/14/2011 2:36 PM, Erick Erickson wrote:
 What is the machine used for? Was your user looking at
 a master? Slave? Something used for both?

Stand-alone machine with multiple Solr cores. No replication.

 Measuring the size of all the files in the index? Or looking
 at memory?

Disk space.

 The index files shouldn't be getting bigger unless there
 were indexing operations going on. 

That's what I thought.

 Is it at all possible that
 DIH was configured to run automatically (or any other
 indexing job for that matter) and your user didn't realize it?

There's no DIH, but there is a custom app that submit docs
for indexing via SolrJ. Supposedly, Solr logs were not showing
any updates over night, so the assumption was that no new docs
were added.

I'd write it off as a user error, but wanted to double check with
the community that no other internal Solr/Lucene task can change the index
file size in the absence of submits.


Can index size increase when no updates/optimizes are happening?

2011-09-13 Thread Yury Kats
One of my users observed that the index size (in bytes)
increased over night. There was no indexing activity
at that time, only querying was taking place.

Running optimize brought the index size back down to
what it was when indexing finished the day before.

What could explain that?



Re: Parameter not working for master/slave

2011-09-12 Thread Yury Kats
On 9/11/2011 11:24 PM, William Bell wrote:
 I am using 3.3 SOLR. I tried passing in -Denable.master=true and
 -Denable.slave=true on the Slave machine.
 Then I changed solrconfig.xml to reference each as per:
 
 http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

These are core parameters, you need to set them in solr.xml per core.


Re: Replication setup with SolrCloud/Zk

2011-09-10 Thread Yury Kats
On 9/10/2011 3:54 PM, Pulkit Singhal wrote:
 Hi Yury,
 
 How do you manage to start the instances without any issues? The way I see
 it, no matter which instance is started first, the slave will complain about
 not being to find its respective master because that instance hasn't been
 started yet ... no?

Yes, but it's not a big deal. The slaves polls periodically, so next time
around the master will be up.


Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-09 Thread Yury Kats
On 9/9/2011 10:52 AM, Pulkit Singhal wrote:
 Thank You Yury. After looking at your thread, there's something I must
 clarify: Is solr.xml not uploaded and held in ZooKeeper? 

Not as far as I understand. Cores are loaded/created by the local
Solr server based on solr.xml and then registered with ZK, so that
ZK know what cores are out there and how they are organized in shards.


 because you have a slightly different config between Node 1  2:
 http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html


I have two shards, each shard having a master and a slave core.
Cores are located so that master and slave are on different nodes.
This protects search (but not indexing) from node failure.


Re: SolrCloud and replica question

2011-09-09 Thread Yury Kats
On 9/9/2011 4:48 PM, Jamie Johnson wrote:
 When doing writes do all writes need to be done to the primary shard
 or are writes that are done to the replica also pushed to all replicas
 of that shard?
 

If you have replication setup between cores, all changes to the
slave will be overwritten by replication. Therefore it makes sense
to submit docs for indexing only to the master cores


Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-09 Thread Yury Kats
On 9/9/2011 6:54 PM, Pulkit Singhal wrote:
 Thanks Again.
 
 Another question:
 
 My solr.xml has:
   cores adminPath=/admin/cores defaultCoreName=master1
 core name=master1 instanceDir=. shard=shard1 collection=myconf/
   /cores
 
 And I omitted -Dcollection.configName=myconf from the startup command
 because I felt that specifying collection=myconf should take care of
 that:
 cd /trunk/solr/example
 java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar

With this you are telling ZK to bootstrap a collection with content of specific
files, but you don't tell what collection that should be.

Hence you want collection.configName parameter, and you want
solr.xml to reference the same name in 'collection' attribute for the cores,
so that SolrCloud knows where to pull configuration for that core from.




Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-07 Thread Yury Kats
On 9/7/2011 3:18 PM, Pulkit Singhal wrote:
 Hello,
 
 I'm working off the trunk and the following wiki link:
 http://wiki.apache.org/solr/SolrCloud
 
 The wiki link has a section that seeks to quickly familiarize a user
 with replication in SolrCloud - Example B: Simple two shard cluster
 with shard replicas
 
 But after going through it, I have to wonder if this is truly
 replication? 

Not really. Replication is not set up in the example.
The example use replicas as copies, to demonstrate high search
availability.

 Because if it is truly replication then somewhere along
 the line, the following properties must have been set
 programmatically:
 replicateAfter, confFiles, masterUrl, pollInterval
 Can someone tell me: Where exactly in the code is this happening?

Nowhere.

If you want replication, you need to set all the properties you listed
in solrconfig.xml.

I've done it recently, see 
http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html



Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Yury Kats
On 8/8/2011 4:34 PM, Jason Toy wrote:
 Aelexei, thank you , that does seem to work.
 
 My sort results seem to be totally wrong though, I'm not sure if its because
 of my sort function or something else.
 
 My query consists of:
 sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
 And I get back 4571232 hits.

That would be the total number of docs, I guess.
Since your query is *:*, ie find everything.

 All the results don't have the phrase indie music anywhere in their data.

You are only sorting on termfreq of indie music, you are not querying
documents that contain it.


Re: Example Solr Config on EC2

2011-08-08 Thread Yury Kats
On 8/8/2011 5:03 PM, Matt Shields wrote:
 I'm looking for some examples of how to setup Solr on EC2.  The
 configuration I'm looking for would have multiple nodes for redundancy.
  I've tested in-house with a single master and slave with replication
 running in Tomcat on Windows Server 2003, but even if I have multiple slaves
 the single master is a single point of failure.  Any suggestions or example
 configurations?

This article describes various configurations:
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e410


Re: cores vs indices

2011-08-07 Thread Yury Kats
On 8/8/2011 12:00 AM, Daniel Schobel wrote:
 Can someone provide me with a succinct defintion of what a solr core
 is? Is there a one-to-one relationship of cores to solr indices or can
 you have multiple indices per core?

http://wiki.apache.org/solr/CoreAdmin

There's one index per core.


SolrCloud: is there a programmatic way to create an ensemble

2011-08-02 Thread Yury Kats
I have multiple SolrCloud instances, each running its own Zookeeper
(Solr launched with -DzkRun).

I would like to create an ensemble out of them. I know about -DzkHost
parameter, but can I achieve the same programmatically? Either with
SolrJ or REST API?

Thanks,
Yury


CoreAdminHandler: can I specify custom properties when creating cores?

2011-07-25 Thread Yury Kats
When crating cores through solr.xml, I am able to specify custom
properties, to be referenced in solrconfig.xml. For example:

 cores adminPath=/admin/cores defaultCoreName=master
   core name=master instanceDir=core1 shard=shard1 collection=myconf 
 property name=enable.master value=true /
   /core
   core name=slave instanceDir=core2 shard=shard2 collection=myconf
 property name=enable.slave value=true /
 property name=masterHost value=node2:8983 /
   /core
 /cores

This would create a master core and a slave core, participating in replication,
both sharing the same solrconfig.xml for replication setup.

Is there a way to specify such properties when creating cores through a 
CoreAdminHandler
request [1]?

Thanks,
Yury

[1] http://wiki.apache.org/solr/CoreAdmin#CREATE


Re: what is the need of setting autocommit in solrconfig.xml

2011-05-27 Thread Yury Kats
On 5/27/2011 6:48 AM, Romi wrote:
 What is the benifit of setting autocommit in solrconfig.xml.
 i read somewhere that these settings control how often pending updates will
 be automatically pushed to the index.
 does it mean if solr server is running then it  automaticaly starts indexing
 process if it finds any updates in database???

No, it means it automatically commits recently added documents to the index
so that they become searchable.


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Yury Kats
On 5/25/2011 9:29 AM, Romi wrote:
 and in http://wiki.apache.org/solr/SchemaXml#Fields it is clearly mentioned
 that a non-indexed field is not searchable then why i am getting search
 result. why should stored=true matter if indexed=false

indexed controls whether you can find the document based on the content of 
this field.
stored controls whether you will see the content of this field in the result.



Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Yury Kats
On 5/18/2011 4:19 PM, Judioo wrote:

 Any help is greatly appreciated. Pointers to documentation that address my
 issues is even more helpful.

I think this would be a good start:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource


Replication setup with SolrCloud/Zk

2011-05-17 Thread Yury Kats
Hi,

I have two Solr nodes, each managing two cores -- a master core and a slave 
core.
The slaves are setup to replicate from the other node's masters
That is, node1.master - node2.slave, node2.master - node1.slave.

The replication is configured in each core's solrconfig.xml, eg

Master's solrconfig.xml on both nodes:

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
  /lst
/requestHandler

node1.Slave's solrconfig.xml:

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
str name=masterUrlhttp://node2:8983/solr/master/replication/str
str name=pollInterval01:00:00/str
  /lst
/requestHandler

node2.Slave's solrconfig.xml:

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
str name=masterUrlhttp://node1:8983/solr/master/replication/str
str name=pollInterval01:00:00/str
  /lst
/requestHandler

This is all working great with regular Solr. I am now trying to move
to SolrCloud/ZK and can't figure out how to keep my replication settings.

The SolrCloud/ZK seems to be managing one configuration for all cores/nodes
in the cluster, yet I need to keep 3 different soltconfig.xml apart -- one
for the masters and one for each of the slaves. The rest of the configuration
(schema.xml etc) is identical to all cores and can be shared.

I found a reference to master/slave setup with Zk in the wiki [1].
Has it been implemented or is this a proposal? If it is implemented,
it's not quite clear to me how to setup the ReplicationHandler
to have 2 different slave cores to pull from two different masters.

Any help/idea would be appreciated!

Thanks,
Yury

[1] http://wiki.apache.org/solr/ZooKeeperIntegration#Master.2BAC8-Slave



Re: Replication setup with SolrCloud/Zk

2011-05-17 Thread Yury Kats
On 5/17/2011 10:17 AM, Stefan Matheis wrote:
 Yury,
 
 perhaps Java-Pararms (like used for this sample:
 http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node)
 can help you?

Ah, thanks! It does seem to work!

Cluster's solrconfig.xml (shared between all Solr instances and cores via 
SolrCloud/ZK):
requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
str name=enable${enable.master:false}/str
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
  /lst
  lst name=slave
str name=enable${enable.slave:false}/str
str name=pollInterval00:01:00/str
str name=masterUrlhttp://${masterHost:xyz}/solr/master/replication/str
  /lst
/requestHandler

Node 1 solr.xml:
  cores adminPath=/admin/cores defaultCoreName=master
core name=master instanceDir=core1 shard=shard1 collection=myconf 
  property name=enable.master value=true /
/core
core name=slave instanceDir=core2 shard=shard2 collection=myconf
  property name=enable.slave value=true /
  property name=masterHost value=node2:8983 /
/core
  /cores

Node 2 solr.xml:
  cores adminPath=/admin/cores defaultCoreName=master
core name=master instanceDir=core1 shard=shard2 collection=myconf 
  property name=enable.master value=true /
/core
core name=slave instanceDir=core2 shard=shard1 collection=myconf
  property name=enable.slave value=true /
  property name=masterHost value=node1:8983 /
/core
  /cores



Re: Specifying backup location in solrconfig.xml

2011-05-17 Thread Yury Kats
I would create a replication slave, for which you can specify whatever
location you want, even put it on a different machine. If ran on the same
machine, the slave can be another core in the same Solr instance.


On 5/17/2011 2:20 PM, Dietrich wrote:
 I am using Solr Replication to create a snapshot for backup purposes
 after each optimize:
 requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=backupAfteroptimize/str
 str 
 name=confFilesschema.xml,mapping-ISOLatin1Accent.txt,protwords.txt,stopwords.txt,synonyms.txt,elevate.xml/str
 /lst
 /requestHandler
 
 
 That works fine, but i need to create the snapshots somewhere outside
 the data directory. I tried specifying a location like this:
 str name=location${solr.home}/backup/site/str
 or
 str name=location/opt/solr/backup/site/str
 
 but Solr is complaining:
 SEVERE: java.io.IOException: Cannot run program snapshooter (in
 directory solr/bin): java.io.IOException: error=2, No such file or
 directory
 
 How can I specify the location for the backup in solrconfig.xml
 
 Dietrich