date:20130516

At admin gui click on the Cloud link then Tree link. A page will open
and choose clusterstate.json from list. Scroll down to end and you will see
something like: router:compositeId



2013/5/16 santoash santo...@me.com

 Im trying to find out which routing algorithm (implicit/composite id) is
 being used in my cluster. We are running solr 4.1. I was expecting to see
 it in my clusterState (based on a previous thread that someone else posted)
 but  I don't see it there. Could someone please help?

 Thanks!

 Santoash

Compatible collections SOLR4 / SOLRCloud?

2013-05-16 Thread Marcin


Hi there,

I am trying to figure out what SOLR means by compatible collection in 
order to be able to run the following query:


Query all shards of multiple compatible collections, explicitly specified:

http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT

Does this mean that the schema.xml must be exactly same between those 
collections or just partially same (share same fields used to satisfy 
the query)?


cheers,
/Marcin

Re: error while switching from log4j back to slf4j with solr 4.3

2013-05-16 Thread Bernd Fehling

OK, solved.
I have now run-jetty-run with log4j running.
Just copied log4j libs from example/lib/ext to webapp/WEB-INF/classes and
set -Dlog4j.configuration in run-jetty-run VM classpath.

Thanks,
Bernd

Am 15.05.2013 16:31, schrieb Shawn Heisey:
 On 5/15/2013 12:52 AM, Bernd Fehling wrote:
 while I can't get solr 4.3 with run-jetty-run up and running under eclipse
 for debugging I tried to switch back to slf4j and followed
 the steps of http://wiki.apache.org/solr/SolrLogging

 Unfortunately eclipse bothers me with an error:
 The import org.apache.log4j.AppenderSkeleton cannot be resolved
 EventAppender.java   
 /solr/core/src/java/org/apache/solr/logging/log4jline 19 Java Problem

 log4j-over-slf4j-1.6.6.jar has no class AppenderSkeleton as log4j-1.2.16.jar 
 does.
 
 Can you please send a listing of the directory where you have your slf4j
 jars and the full exception stacktrace from your log?  Please use a
 paste website, such as pastie.org.
 
 Thanks,
 Shawn

Re: indexing unrelated tables in single core

I am not able to index the fields from data base its getting failed...

data-config.xml

dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/test
user=user password=dfsdf/
 document
entity name=catalogsearch_query query=select query_id,query_text
from catalogsearch_query where num_results!= 0
   field column=query_id name=query_id/
   field column=query_text name=user_query/
/entity
/document

its showing all failed and 0 indexed


On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 1. Create a schema that accomodates both types of fields either using
 optional fields or dynamic fields.
 2. Create some sort of differentiator key (e.g. schema), separately
 from id (which needs to be globally unique, so possibly schema+id)
 3. Use that schema in filter queries (fq) to look only at subject of items
 4. (Optionally) define separate search request handlers that force
 that schema parameter (using appends or invariants instead of
 defaults)

 That should get you most of the way there.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com
 wrote:
  hi all
 
 
  I want to index 2 separate unrelated tables from database into single
 solr
  core and search in any one of the document separately how can I do it?
  please help
 
  thanks in advance
  regards
  Rohan

Re: indexing unrelated tables in single core

its saying in the logs that missing required field title which is no where
in the database...


On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com wrote:

 I am not able to index the fields from data base its getting failed...

 data-config.xml

 dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost/test
 user=user password=dfsdf/
  document
 entity name=catalogsearch_query query=select query_id,query_text
 from catalogsearch_query where num_results!= 0
field column=query_id name=query_id/
field column=query_text name=user_query/
 /entity
 /document

 its showing all failed and 0 indexed


 On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch arafa...@gmail.com
  wrote:

 1. Create a schema that accomodates both types of fields either using
 optional fields or dynamic fields.
 2. Create some sort of differentiator key (e.g. schema), separately
 from id (which needs to be globally unique, so possibly schema+id)
 3. Use that schema in filter queries (fq) to look only at subject of items
 4. (Optionally) define separate search request handlers that force
 that schema parameter (using appends or invariants instead of
 defaults)

 That should get you most of the way there.

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com
 wrote:
  hi all
 
 
  I want to index 2 separate unrelated tables from database into single
 solr
  core and search in any one of the document separately how can I do it?
  please help
 
  thanks in advance
  regards
  Rohan

Re: indexing unrelated tables in single core

2013-05-16 Thread Michael Della Bitta

True, it's complaining that your Solr schema has a required field 'title'
and your query and data import config aren't providing it.
On May 16, 2013 5:51 AM, Rohan Thakur rohan.i...@gmail.com wrote:

 its saying in the logs that missing required field title which is no where
 in the database...


 On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com
 wrote:

  I am not able to index the fields from data base its getting failed...
 
  data-config.xml
 
  dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost/test
  user=user password=dfsdf/
   document
  entity name=catalogsearch_query query=select query_id,query_text
  from catalogsearch_query where num_results!= 0
 field column=query_id name=query_id/
 field column=query_text name=user_query/
  /entity
  /document
 
  its showing all failed and 0 indexed
 
 
  On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
   wrote:
 
  1. Create a schema that accomodates both types of fields either using
  optional fields or dynamic fields.
  2. Create some sort of differentiator key (e.g. schema), separately
  from id (which needs to be globally unique, so possibly schema+id)
  3. Use that schema in filter queries (fq) to look only at subject of
 items
  4. (Optionally) define separate search request handlers that force
  that schema parameter (using appends or invariants instead of
  defaults)
 
  That should get you most of the way there.
 
  Regards,
 Alex.
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com
  wrote:
   hi all
  
  
   I want to index 2 separate unrelated tables from database into single
  solr
   core and search in any one of the document separately how can I do it?
   please help
  
   thanks in advance
   regards
   Rohan

Lucene-Solr indexing document via Post method

2013-05-16 Thread Rider Carrion Cleger

Hi guys,

I'm trying to run solr with apache tomcat. So, It's possible to indexing
documents via Post method, using Lucene-Solr ? Which is the correct
way to index
documents in Solr ?

thanks

loading dataimport configs

2013-05-16 Thread Nathan Findley

I am using the data importer that feeds off of mysql. When adding new 
DataImportHandler requestHandles to solrconfig.xml, I can upload my 
changes with the following command:


./zkcli.sh -zkhost 10.0.1.107:2181 -cmd upconfig -confdir configs 
-confname collection1


Good: I can see the changed files in zookeeper. I can get them and see 
that the contents changed as well.


Bad: When I call

curl http://localhost:8983/solr/collection1/data-point-1?command=full-import

or browse to http://solr-ip/solr/#/collection1/dataimport/data-point-1

solr is complaining that the config for data-point-1 (for example) 
cannot be found. Any ideas what I might be doing wrong?


--
CTO
Zenlok株式会社

Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

-- *Edismax and Filter Queries with Commas and spaces* --

Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,
arr name=filter_queries
str(titles:(,10))/str
str(collection:assets)/str

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Re: indexing unrelated tables in single core

hi

I got the problem it is with the unique key defined in the schema.xml
if i difine it to be query_id then while indexing it says
missing mandatory key query_id which is not present in the root
entity(data-config.xml) which is indexing the product from the database
which has product_id as the unique key and when in schema I set product_id
as the unique key then it says missing mandatory key product_id which is
not present in the root entity(data-config.xml) which is indiexing the user
query from another table in the database which has user_id as the unique
key.

how can I fix this thanks I want to index both the tables which are
basically unrelated that is does not have any *Common*  fields

thanks
rohan


On Thu, May 16, 2013 at 3:24 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 True, it's complaining that your Solr schema has a required field 'title'
 and your query and data import config aren't providing it.
 On May 16, 2013 5:51 AM, Rohan Thakur rohan.i...@gmail.com wrote:

  its saying in the logs that missing required field title which is no
 where
  in the database...
 
 
  On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com
  wrote:
 
   I am not able to index the fields from data base its getting failed...
  
   data-config.xml
  
   dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/test
   user=user password=dfsdf/
document
   entity name=catalogsearch_query query=select
 query_id,query_text
   from catalogsearch_query where num_results!= 0
  field column=query_id name=query_id/
  field column=query_text name=user_query/
   /entity
   /document
  
   its showing all failed and 0 indexed
  
  
   On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
wrote:
  
   1. Create a schema that accomodates both types of fields either using
   optional fields or dynamic fields.
   2. Create some sort of differentiator key (e.g. schema), separately
   from id (which needs to be globally unique, so possibly schema+id)
   3. Use that schema in filter queries (fq) to look only at subject of
  items
   4. (Optionally) define separate search request handlers that force
   that schema parameter (using appends or invariants instead of
   defaults)
  
   That should get you most of the way there.
  
   Regards,
  Alex.
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
  
  
   On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com
   wrote:
hi all
   
   
I want to index 2 separate unrelated tables from database into
 single
   solr
core and search in any one of the document separately how can I do
 it?
please help
   
thanks in advance
regards
Rohan

Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil

Hi all, I hope you can advise a solution to our incorrect data directory
issue.

 

We have 2 physical servers using Solr 4.3.0, each with 24 separate
tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a
solr shard in each. This configuration means that each shard has its own
data directory declared. (Server OS, tomcat and solr, including shards,
created via automated builds.) 

 

That is, for example,

- tomcat instance, /var/local/tomcat/solrshard3/, port 8985

- corresponding solr instance, /usr/local/solrshard3/, with
/usr/local/solrshard3/collection1/conf/solrconfig.xml

- corresponding solr data directory,
/var/local/solrshard3/collection1/data/

 

We process ~1.5 billion documents, which is why we use so 48 shards (24
leaders, 24 replicas). These physical servers are rebooted regularly to
fsck their drives. When rebooted, we always see several (~10-20) shards
failing to start (UI cloud view shows them as 'Down' or 'Recovering'
though they never recover without intervention), though there is not a
pattern to which shards fail to start - we haven't recorded any that
always or never fail. On inspection, the UI dashboard for these failed
shards displays, for example:

- HostServer1

- Instance/usr/local/sholrshard3/collection1

- Data/var/local/solrshard6/collection1/data

- Index  /var/local/solrshard6/collection1/data/index

 

To fix such failed shards, I manually restart the shard leader and
replicas, which fixes the issue. However, of course, I would like to
know a permanent cure for this, not a remedy.

 

We use a separate zookeeper service, spread across 3 Virtual Machines
within our private network of ~200 servers (physical and virtual).
Network traffic is constant but relatively little across 1GB bandwidth.

 

Any advice or suggestions greatly appreciated.

Gil

 

Gil Hoggarth

Web Archiving Engineer

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

Re: indexing unrelated tables in single core

I mean to say that

I want to index 2 tables that is using 2 root entity in data-config.xml
one is product table and other is user search table these both have
no foreign key and I want to index both of them as document in my solr
index what should I do...its taking either one of them and rejecting other
table as document when I am taking primary key of one table as unique key
in the solr schema...and vice verca.how to solve this?


On Thu, May 16, 2013 at 4:24 PM, Rohan Thakur rohan.i...@gmail.com wrote:

 hi

 I got the problem it is with the unique key defined in the schema.xml
 if i difine it to be query_id then while indexing it says
 missing mandatory key query_id which is not present in the root
 entity(data-config.xml) which is indexing the product from the database
 which has product_id as the unique key and when in schema I set product_id
 as the unique key then it says missing mandatory key product_id which is
 not present in the root entity(data-config.xml) which is indiexing the user
 query from another table in the database which has user_id as the unique
 key.

 how can I fix this thanks I want to index both the tables which are
 basically unrelated that is does not have any *Common*  fields

 thanks
 rohan


 On Thu, May 16, 2013 at 3:24 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 True, it's complaining that your Solr schema has a required field 'title'
 and your query and data import config aren't providing it.
 On May 16, 2013 5:51 AM, Rohan Thakur rohan.i...@gmail.com wrote:

  its saying in the logs that missing required field title which is no
 where
  in the database...
 
 
  On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com
  wrote:
 
   I am not able to index the fields from data base its getting failed...
  
   data-config.xml
  
   dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/test
   user=user password=dfsdf/
document
   entity name=catalogsearch_query query=select
 query_id,query_text
   from catalogsearch_query where num_results!= 0
  field column=query_id name=query_id/
  field column=query_text name=user_query/
   /entity
   /document
  
   its showing all failed and 0 indexed
  
  
   On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
wrote:
  
   1. Create a schema that accomodates both types of fields either using
   optional fields or dynamic fields.
   2. Create some sort of differentiator key (e.g. schema), separately
   from id (which needs to be globally unique, so possibly schema+id)
   3. Use that schema in filter queries (fq) to look only at subject of
  items
   4. (Optionally) define separate search request handlers that force
   that schema parameter (using appends or invariants instead of
   defaults)
  
   That should get you most of the way there.
  
   Regards,
  Alex.
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via
 GTD
   book)
  
  
   On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com
   wrote:
hi all
   
   
I want to index 2 separate unrelated tables from database into
 single
   solr
core and search in any one of the document separately how can I do
 it?
please help
   
thanks in advance
regards
Rohan

Re: indexing unrelated tables in single core

2013-05-16 Thread Gora Mohanty

On 16 May 2013 16:24, Rohan Thakur rohan.i...@gmail.com wrote:
 hi

 I got the problem it is with the unique key defined in the schema.xml
 if i difine it to be query_id then while indexing it says
 missing mandatory key query_id which is not present in the root
 entity(data-config.xml) which is indexing the product from the database
 which has product_id as the unique key and when in schema I set product_id
 as the unique key then it says missing mandatory key product_id which is
 not present in the root entity(data-config.xml) which is indiexing the user
 query from another table in the database which has user_id as the unique
 key.

 how can I fix this thanks I want to index both the tables which are
 basically unrelated that is does not have any *Common*  fields
[...]

Fix it in the SELECT statement:
  SELECT product_id as id,... for one entity, and
  SELECT query_id as id,... in the other
and use id as the uniqueKey for Solr.

Regards,
Gora

Re: indexing unrelated tables in single core

hi mohanty

I appreciate it but dint get that can you please elaborate?
my dataconfig is like:
 entity name=catalogsearch_query query=select query_id,query_text from
catalogsearch_query where num_results!= 0
   field column=query_id name=value_id/
   field column=query_text name=user_query/
/entity

entity name=catalog_product_entity_varchar query=select
value_id,value,entity_id,attribute_id from catalog_product_entity_varchar
where attribute_id=60
field column=value_id name=value_id/
field column=value name=title/
field column=entity_id name=product_id/
field column=attribute_id name=attribute/
/entity


my schema is like:
fields
   field name=keyfeatures type=text_en_splitting indexed=true
stored=true required= false/
   field name=value_id type=plong indexed=true stored=false/
   field name=product_id type=plong indexed=true stored=true/
   field name=features type=text_en_splitting_tight indexed=true
stored=false required=false  multiValued=true/
   !--field name=f_product_id type=plong indexed=true
stored=true/
   field name=f_value_id type=plong indexed=true stored=true/ --
   field name=attribute type=plong indexed=false stored=false/
   field name=title type=text_en_splitting indexed=true
stored=true required= true/
   field name=image type=text_en_splitting_tight indexed=false
stored=false/
   field name=url type=text_en_splitting_tight indexed=false
stored=false/
   field name=brand type=text_en indexed=true stored=true/
   field name=procat type=text_en indexed=true stored=true/
   field name=rootcat type=text_en indexed=true stored=true/
   field name=color type=text_en indexed=true stored=true/
   field name=sku type=text_en_splitting_tight indexed=true
stored=true/
   field name=spell type=tSpell indexed=true stored=true /
   field name=query_id type=plong indexed=true stored=true /
   field name=user_query type=text_en_splitting_tight indexed=true
stored=true required=false/
/fields

uniqueKeyvalue_id/uniqueKey

 !-- field name=solr_value type=text indexed=true stored=true/
--
 !-- field for the QueryParser to use when an explicit fieldname is absent
DEPRECATED: specify df in your request handler instead. --

 defaultSearchFieldtitle/defaultSearchField


thanks regards
Rohan


On Thu, May 16, 2013 at 5:11 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 16 May 2013 16:24, Rohan Thakur rohan.i...@gmail.com wrote:
  hi
 
  I got the problem it is with the unique key defined in the schema.xml
  if i difine it to be query_id then while indexing it says
  missing mandatory key query_id which is not present in the root
  entity(data-config.xml) which is indexing the product from the database
  which has product_id as the unique key and when in schema I set
 product_id
  as the unique key then it says missing mandatory key product_id which is
  not present in the root entity(data-config.xml) which is indiexing the
 user
  query from another table in the database which has user_id as the unique
  key.
 
  how can I fix this thanks I want to index both the tables which are
  basically unrelated that is does not have any *Common*  fields
 [...]

 Fix it in the SELECT statement:
   SELECT product_id as id,... for one entity, and
   SELECT query_id as id,... in the other
 and use id as the uniqueKey for Solr.

 Regards,
 Gora

Re: Can we search some mandatory words and some optional words in SOLR

2013-05-16 Thread Kamal Palei

Thanks Hoss, I modified accordingly.
One more thing I observed, if I give search key as one of the below

1. +Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing
2. +(TCL Perl Selenium) -ethernet -switching -routing
3. +(TCL Perl Selenium)

It works as expected. Like if key is +(TCL Perl Selenium) , then it
searches documents having atleast one or more  keyword out of TCL Perl
Selenium.

Best Regards
Kamal






On Wed, May 15, 2013 at 10:58 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : +Java +mysql +php TCL Perl Selenium -ethernet -switching -routing

 that's missing one of the started requirements...

 : 2. Atleast one keyword out of* TCL Perl Selenium* should be present

 ...should be...

+Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing


 -Hoss

Concurrent connections

2013-05-16 Thread Arkadi Colson

Is there a limitation on the number concurrent connections to a Solr 
host? Because we have some scripts running simultaious to fill Solr and 
when starting up to many we are getting this error:



exception 'SolrClientException' with message 'Unsuccessful update 
request. Response Code 0. (null)' in solr_queue_processor.php:467

Stack trace:
#0 solr_queue_processor.php(467): 
SolrClient-addDocument(Object(SolrInputDocument))

#1 {main}

Thx

Is payload the right solution for my problem?

2013-05-16 Thread NabbleUser

Hi,

I recently read about payloads in the Apache Solr 4 Cookbook and would like
to know if this is the
right solution for my problem or if other methods are more suitable.

Generally, I need to perform fulltext search in a field (including
highlighting) where I need metadata per token in the search result, but I do
not need to search in that metadata.

I have documents containing data (not natural language), where each data
entry contains multiple metadata informations. An example with a sentence
and as XML-like structure could be
meta attr1=val11 attr2=val2 attr3=val3This/meta
meta attr1=val13 attr2=val7 attr3=val3is/meta
meta attr1=val16 attr2=val22 attr3=val3one/meta
meta attr1=val14 attr2=val2 attr3=val3sentence./meta
Additionaly there exist some fields per document that i need for faceting
etc. (id, category, timestamp etc.)

When searching, I want to search only in This is one sentence., a search
for attr1 or val3 should give no results. However, when searching for
one in the search response I need to know attr1=val16 attr2=val22 and
attr3=val3.

My first intuition when creating the schema was to create a multiValue field
content containing each word in the document. Then I add attr1, attr2 and
attr3 as payload to each word/token.
Is this the right way to use payloads? Or is there a better solution for
such a task?
I imagine this to be a common use case: searching in a cleaned version of
the data and returning the original one.

Could anyone please provide suggestions on how to tackle such a task? The
book and the Solr wiki pages
did not lead me to anything that I could immediately identify as a solution
to my problem.

If the proposed solution depends on the data: each document might have 3-8
additional attributes, and there might be between 100-1 tokens per
document.

Regards

--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-payload-the-right-solution-for-my-problem-tp4063814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Hierarchical Faceting

2013-05-16 Thread varsha.yadav


Hi,

Thanks Upayavira . But still i am not getting results well.
I have been followinghttp://wiki.apache.org/solr/HierarchicalFaceting
I have hierarchical data for facet . Some documents also have multiple 
hierarchy. like :

Doc#1 London  UK  51.5
Doc#2 UK 54.0
Doc#3 Indiana  United States  40.0, London UK51.5
Doc#4 United States  39.7, Washington  United States  38.8

what can be optimal schema for indexing this data so that i get 
following result by solr query :
1) i want to retrieve hierarchical data count by facet pivot query . ex: 
facet.pivot=country,state
2) I want Lat values wrt every document in query output.ex: Doc#3 
40.0,51.5 . Doc#2 54.0
3) I get direct search query like country:United states . state 
:Washington


 I think through this i am able to express my requirement along with data .
Please tell me how can i put data index and retreive through query .
I check out solution which you provided me about 
PathHierarchyTokenizerFactory. But along with hierarachy i have to put 
data with name State,district,lat,lon etc. So that i can also access 
direct query on fields.


Thanks
Varsha

On 05/15/2013 10:32 PM, Upayavira wrote:

Can't you use the PathHierarchyTokenizerFactory mentioned on that page?
I think it is called descendent-path in the default schema. Won't that
get you what you want?

   UK/London/Covent Garden
becomes
   UK
   UK/London
   UK/London/Covent Garden

and
   India/Maharastra/Pune/Dapodi
becomes
   India
   India/Maharastra
   India/Maharastra/Pune
   India/Maharastra/Pune/Dapodi

These fields can be multivalued.

Upayavira

On Wed, May 15, 2013, at 12:29 PM, varsha.yadav wrote:

Hi

I go through that but i want to index multiple location in single
document and a single location have multiple feature/attribute like
country,state,district etc. I want  Index and want hierarchical facet
result on facet pivot query. One more thing , my document varies may
have single ,two ,three.. any number of location.


On 05/15/2013 03:55 PM, Upayavira wrote:

http://wiki.apache.org/solr/HierarchicalFaceting

On Wed, May 15, 2013, at 09:44 AM, varsha.yadav wrote:

Hi Everyone,

I am working on Hierarchical Faceting. I am indexing location of
document with their state and district.
I would like to find counts of every country with state count and
district count. I found facet pivot working well to give me count if i
use single valued fields like
---
doc
str name=countryindia/str
str name=statemaharashtra/str
/doc
doc
str name=countryindia/str
str name=stategujrat/str
/doc
doc
str name=countryindia/str
str name=districtFaridabad/str
str name=stateHaryana/str
/doc
doc
str name=countrychina/str
str name=districtfoshan/str
str name=stateguangdong/str
/doc

I found results that is fine :
arr name=country,state,district,event
lst
str name=fieldcountry/str
str name=valueindia/str
int name=count1/int
arr name=pivot

lst
str name=fieldstate/str
str name=valuemaharashtra/str
int name=count1/int
arr name=pivot/arr
lst
str name=fieldstate/str
str name=valueHaryana/str
int name=count1/int
arr name=pivot
lst
str name=fielddistrict/str
str name=valueFaridabad/str
int name=count1/int
/lst
/arr
/lst
/arr
/lst
/arr
/lst
lst
str name=fieldcountry/str
str name=valuechina/str
int name=count1/int
arr name=pivot
lst
str name=fieldstate/str

/lst
/arr


But if my document have multiple location like :

doc
arr name=location
strjapan|JAPAN|null|/str
str
brisbane|Australia|Queensland
/str
str
afghanistan|AFGHANISTAN|null
/str
/arr
/doc

doc
arr name=location
str
afghanistan|AFGHANISTAN|null
/str
/arr
/doc

doc
arr name=location
str
brisbane|Australia|Queensland
/str
/str
/arr
/doc


Can anyone tell , me how should i put data in solr index to get
hierarical data.

Thanks
Varsha


--
Thanks  Regards
Varsha




--
Thanks  Regards
Varsha

Re: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Daniel Collins

What actual error do you see in Solr?  Is there an exception and if so, can
you post that?  As I understand it, datatDir is set from the solrconfig.xml
file, so either your instances are picking up the wrong file, or you have
some override which is incorrect?  Where do you set solr.data.dir, at the
environment when you start Solr or in solrconfig?


On 16 May 2013 12:23, Hoggarth, Gil gil.hogga...@bl.uk wrote:

 Hi all, I hope you can advise a solution to our incorrect data directory
 issue.



 We have 2 physical servers using Solr 4.3.0, each with 24 separate
 tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a
 solr shard in each. This configuration means that each shard has its own
 data directory declared. (Server OS, tomcat and solr, including shards,
 created via automated builds.)



 That is, for example,

 - tomcat instance, /var/local/tomcat/solrshard3/, port 8985

 - corresponding solr instance, /usr/local/solrshard3/, with
 /usr/local/solrshard3/collection1/conf/solrconfig.xml

 - corresponding solr data directory,
 /var/local/solrshard3/collection1/data/



 We process ~1.5 billion documents, which is why we use so 48 shards (24
 leaders, 24 replicas). These physical servers are rebooted regularly to
 fsck their drives. When rebooted, we always see several (~10-20) shards
 failing to start (UI cloud view shows them as 'Down' or 'Recovering'
 though they never recover without intervention), though there is not a
 pattern to which shards fail to start - we haven't recorded any that
 always or never fail. On inspection, the UI dashboard for these failed
 shards displays, for example:

 - HostServer1

 - Instance/usr/local/sholrshard3/collection1

 - Data/var/local/solrshard6/collection1/data

 - Index  /var/local/solrshard6/collection1/data/index



 To fix such failed shards, I manually restart the shard leader and
 replicas, which fixes the issue. However, of course, I would like to
 know a permanent cure for this, not a remedy.



 We use a separate zookeeper service, spread across 3 Virtual Machines
 within our private network of ~200 servers (physical and virtual).
 Network traffic is constant but relatively little across 1GB bandwidth.



 Any advice or suggestions greatly appreciated.

 Gil



 Gil Hoggarth

 Web Archiving Engineer

 The British Library, Boston Spa, West Yorkshire, LS23 7BQ

FW:

2013-05-16 Thread Michael Lorz

http://hardonfonts.com/mmndsejat.php 





  











Michael Lorz

Re: Concurrent connections

2013-05-16 Thread Otis Gospodnetic

Hi

This is controlled by servlet container,  so any errors should be in its
logs. The same sort of question was asked just a few days ago...

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 16, 2013 8:00 AM, Arkadi Colson ark...@smartbit.be wrote:

 Is there a limitation on the number concurrent connections to a Solr host?
 Because we have some scripts running simultaious to fill Solr and when
 starting up to many we are getting this error:


 exception 'SolrClientException' with message 'Unsuccessful update request.
 Response Code 0. (null)' in solr_queue_processor.php:467
 Stack trace:
 #0 solr_queue_processor.php(467): SolrClient-addDocument(**
 Object(SolrInputDocument))
 #1 {main}

 Thx

Adding a field in schema , storing it and use it to search

2013-05-16 Thread Kamal Palei

Hi All

Need help in adding a new field and making use of it during search.

As of today I just search some keywords and whatever document (actually
these are resumes of individuals) is retrieved from SOLR search I take
these as input, then search in mysql for experience, salary etc and then
selected resumes I show as search result.

Say, while searching in SOLR, I want to achieve something as below.

1. Search keywords in those users resume whose experience is greater than 5
years.

To achieve My understanding is
1. I need to define a new field in schema
2. During indexing, add this parameter
3. During search, have a condition like experience = 5 years


When I will be adding a field , should I add as a normal field one as shown
below

*field name=experience type=integer indexed=true stored=true/*

OR as a dynamic field as shown below

*dynamicField name=exp_*  type=double   indexed=true  stored=true
multiValued=false/*


And during search, how the condition should look like.

Best regards
Kamal

Re: Concurrent connections

2013-05-16 Thread Arkadi Colson


Thx! I found the topic.

Any idea what this is?

SEVERE: The web application [/solr] created a ThreadLocal with key of 
type [org.apache.xmlbeans.impl.store.CharUtil$1] (value 
[org.apache.xmlbeans.impl.store.CharUtil$1@2af27db1]) and a value of 
type [java.lang.ref.SoftReference] (value 
[java.lang.ref.SoftReference@759c8d]) but failed to remove it when the 
web application was stopped. Threads are going to be renewed over time 
to try and avoid a probable memory leak.



Met vriendelijke groeten

Arkadi Colson

Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
T +32 11 64 08 80 • F +32 11 64 08 81

On 05/16/2013 02:37 PM, Otis Gospodnetic wrote:

Hi

This is controlled by servlet container,  so any errors should be in its
logs. The same sort of question was asked just a few days ago...

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 16, 2013 8:00 AM, Arkadi Colson ark...@smartbit.be wrote:


Is there a limitation on the number concurrent connections to a Solr host?
Because we have some scripts running simultaious to fill Solr and when
starting up to many we are getting this error:


exception 'SolrClientException' with message 'Unsuccessful update request.
Response Code 0. (null)' in solr_queue_processor.php:467
Stack trace:
#0 solr_queue_processor.php(467): SolrClient-addDocument(**
Object(SolrInputDocument))
#1 {main}

Thx

Re: How to find the routing algorithm used?

2013-05-16 Thread Santoash Rajaram

I tried looking for it there but I don't see the word router in my clusterstate.

I'm trying to figure out the router info since I have duplicate documents in my 
cluster (document with the same id). In the worst case  I was expecting to see 
something like router: implicit. But I don't see anything.

Any ideas? 

Thanks!

-scr

On May 16, 2013, at 12:31 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 At admin gui click on the Cloud link then Tree link. A page will open
 and choose clusterstate.json from list. Scroll down to end and you will see
 something like: router:compositeId
 
 
 
 2013/5/16 santoash santo...@me.com
 
 Im trying to find out which routing algorithm (implicit/composite id) is
 being used in my cluster. We are running solr 4.1. I was expecting to see
 it in my clusterState (based on a previous thread that someone else posted)
 but  I don't see it there. Could someone please help?
 
 Thanks!
 
 Santoash

Re: Question about Edismax - Solr 4.0

You haven't indicated any problem here! What is the symptom that you 
actually think is a problem.


There is no comma operator in any of the Solr query parsers. Comma is just 
another character that may or may not be included or discarded depending on 
the specific field type and analyzer. For example, a white space analyzer 
will keep commas, but the standard analyzer or the word delimiter filter 
will discard them. If title were a string type, all punctuation would be 
preserved, including commas and spaces (but spaces would need to be escaped 
or the term text enclosed in parentheses.)


Let us know what your symptom is though, first.

I mean, the filter query looks perfectly reasonable from an abstract 
perspective.


-- Jack Krupansky

-Original Message- 
From: Sandeep Mestry

Sent: Thursday, May 16, 2013 6:51 AM
To: solr-user@lucene.apache.org
Subject: Question about Edismax - Solr 4.0

-- *Edismax and Filter Queries with Commas and spaces* --

Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,
arr name=filter_queries
str(titles:(,10))/str
str(collection:assets)/str

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Re: Can we search some mandatory words and some optional words in SOLR

2013-05-16 Thread Kamal Palei

Hi Hoss
I was wondering between this two keys. Though they look similar, but result
set differs.

In 1st case I give key as

+c +c++ +sip +( *tcl* perl shell script) -manual testing -ss7


In 2nd case I give key as

+c +c++ +sip +(*tcl* perl shell script) -manual testing -ss7

Please note that before *tcl* , space is not present in 2nd case.

In 1st case I get more results, and in 2nd case I get only 3 results.
In first case, I see atleast one result was there, which does not have
single optional key (means one document that does not contain either tcl or
perl or shell script). Is it a known issue.., please help..

Or I am doing something wrong in key preparation, please let me know.

Thanks
Kamal













On Wed, May 15, 2013 at 10:58 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : +Java +mysql +php TCL Perl Selenium -ethernet -switching -routing

 that's missing one of the started requirements...

 : 2. Atleast one keyword out of* TCL Perl Selenium* should be present

 ...should be...

+Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing


 -Hoss

RE: Strange fuzzy behavior in 4.2.1

2013-05-16 Thread Ryan Wilson

In answering your first questions, any changes we’ve been making have been
followed by a reindex.



The data that is being indexed generally looks something like this (space
indicating an actual space):



TIM space , space JULIO

JULIE space , space JIM



So based off what we see from looking at top terms in the field and the
analysis tool, at index time these records are being broken up such that
TIM , JULIO can be found with tim or Julio.



Just to make sure I’m not misunderstanding something about Solr/Lucene,
when a record is indexed the index analysis chain result (tim ,
julio) is what is written to disk correct? So far as I understand it it’s
the query analysis chain that has the issue with most filters not being
applied during wildcard and fuzzy queries.



Finally, some clarification as I’ve realized my original email might not
have made this point well. I can have a particular record with a primary
key of X and a name value of LEWIS , JULIA and be able to find that exact
record with bulia~1 but not aulia~1,   or GUERRERO , JULIAN , JULIAN can be
found with julan~1 but not julia~1. It’s not that records go missing when
searched for with fuzzy, but rather the  fuzzy terms that will find them
seem, to my eyes, inconsistent.



Regards,

Ryan Wilson
rpwils...@gmail.com

Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Yonik Seeley

See https://issues.apache.org/jira/browse/SOLR-3939

Do you see these log messages from this in your logs?
  log.info(I may be the new leader - try and sync);

How reproducible is this bug for you?  It would be great to know if
the patch in the issue fixes things.

-Yonik
http://lucidworks.com


On Wed, May 15, 2013 at 6:04 PM, Steven Bower sbo...@alcyon.net wrote:
 They are visible to ls...


 On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote:
  when the TransactionLog objects are dereferenced
  their RandomAccessFile object is not closed..

 Have the files been deleted (unlinked from the directory), or are they
 still visible via ls?

 -Yonik
 http://lucidworks.com

Re: Strange fuzzy behavior in 4.2.1

Maybe you are running into the same problem I posted on another message 
thread about the hard-coded maxExpansions limit of 50. In other words, once 
Lucene finds 50 terms that do match, it won't find the additional matches. 
And that is not necessarily the top 50, but the first 50 in the index.


See if you can reproduce the problem with a small data set of no more than a 
couple dozen documents.


-- Jack Krupansky
-Original Message- 
From: Ryan Wilson

Sent: Thursday, May 16, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Strange fuzzy behavior in 4.2.1

In answering your first questions, any changes we’ve been making have been
followed by a reindex.



The data that is being indexed generally looks something like this (space
indicating an actual space):



TIM space , space JULIO

JULIE space , space JIM



So based off what we see from looking at top terms in the field and the
analysis tool, at index time these records are being broken up such that
TIM , JULIO can be found with tim or Julio.



Just to make sure I’m not misunderstanding something about Solr/Lucene,
when a record is indexed the index analysis chain result (tim ,
julio) is what is written to disk correct? So far as I understand it it’s
the query analysis chain that has the issue with most filters not being
applied during wildcard and fuzzy queries.



Finally, some clarification as I’ve realized my original email might not
have made this point well. I can have a particular record with a primary
key of X and a name value of LEWIS , JULIA and be able to find that exact
record with bulia~1 but not aulia~1,   or GUERRERO , JULIAN , JULIAN can be
found with julan~1 but not julia~1. It’s not that records go missing when
searched for with fuzzy, but rather the  fuzzy terms that will find them
seem, to my eyes, inconsistent.



Regards,

Ryan Wilson
rpwils...@gmail.com

Re: indexing unrelated tables in single core

hi Mohanty

I tried what you suggested of using id as common field and changing the SQL
query to point to id
and using id as uniqueKey
it is working but now what it is doing is just keeping the id's that are
not same in both the tables and discarding the id's that are same in both
the tablesbut this is not correct as both the product_id and query_id
has no relation as such both are representing separate things in each
tables.

regards
Rohan


On Thu, May 16, 2013 at 5:11 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 16 May 2013 16:24, Rohan Thakur rohan.i...@gmail.com wrote:
  hi
 
  I got the problem it is with the unique key defined in the schema.xml
  if i difine it to be query_id then while indexing it says
  missing mandatory key query_id which is not present in the root
  entity(data-config.xml) which is indexing the product from the database
  which has product_id as the unique key and when in schema I set
 product_id
  as the unique key then it says missing mandatory key product_id which is
  not present in the root entity(data-config.xml) which is indiexing the
 user
  query from another table in the database which has user_id as the unique
  key.
 
  how can I fix this thanks I want to index both the tables which are
  basically unrelated that is does not have any *Common*  fields
 [...]

 Fix it in the SELECT statement:
   SELECT product_id as id,... for one entity, and
   SELECT query_id as id,... in the other
 and use id as the uniqueKey for Solr.

 Regards,
 Gora

Multi-select faceting with OR operand

2013-05-16 Thread Aleksandra Nowak

Hi all!
Please tell me if it is posible to create multi-select facets (
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters)
but with OR as operand?
I would like to accomplish something like this:

=== Document Type ===  [ ] Word (42)  [x] PDF  (96)  [X] Excel(11)  [
] HTML (63)

According to the example the query whould look like this
q=mainqueryfq=status:publicfq={!tag=dt}doctype:pdffq={!tag=dt}doctype:Excelfacet=onfacet.field={!ex=dt}doctype.
But I would like to get documents which have doctype:pdf OR doctype:Excel
as results. How to specify in that query that I want OR instead of AND as
operand? Is this possible?
Regards,
Alex

RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil

Thanks for your reply Daniel.

The dataDir is set in each solrconfig.xml; each one has been checked to
ensure it points to its corresponding location. The error we see is that
on machine reboot not all of the shards start successfully, and if the
fail was to be a leader the replicas can't take its place (presumably
because the leader incorrect data directory is inconsistent with their
own).

More detail that I can add is that the catalina.out log for failed
shards reports:
May 15, 2013 5:56:02 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@524e13f6]) and a
value of type
[org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads
are going to be renewed over time to try and avoid a probable memory
leak.

This doesn't (to me) relate to the problem, but that doesn't necessarily
mean it's not. Plus, it's the only SEVERE reported and only reported in
the failed shard catalina.out log.

Checking the zookeeper logs, we're seeing:
2013-05-16 13:25:46,839 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@762] - Connection broken for
id 3, my id = 1, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(Quoru
mCnxManager.java:747)
2013-05-16 13:25:46,841 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
2013-05-16 13:25:46,842 [myid:1] - WARN
[SendWorker:3:QuorumCnxManager$SendWorker@679] - Interrupted while
waiting for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.re
portInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
aitNanos(AbstractQueuedSynchronizer.java:2095)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389
)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(Quorum
CnxManager.java:831)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnx
Manager.java:62)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(Quoru
mCnxManager.java:667)
2013-05-16 13:25:46,843 [myid:1] - WARN
[SendWorker:3:QuorumCnxManager$SendWorker@688] - Send worker leaving
thread

This is I think as separate issue in that this happens immediately after
I restart a zookeeper. (I.e., I see this in a log, restart that
zookeeper, and immediately see a similar issue in one of the other two
zookeeper logs).



-Original Message-
From: Daniel Collins [mailto:danwcoll...@gmail.com] 
Sent: 16 May 2013 13:28
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.0: Shard instances using incorrect data directory
on machine boot

What actual error do you see in Solr?  Is there an exception and if so,
can you post that?  As I understand it, datatDir is set from the
solrconfig.xml file, so either your instances are picking up the wrong
file, or you have some override which is incorrect?  Where do you set
solr.data.dir, at the environment when you start Solr or in solrconfig?


On 16 May 2013 12:23, Hoggarth, Gil gil.hogga...@bl.uk wrote:

 Hi all, I hope you can advise a solution to our incorrect data 
 directory issue.



 We have 2 physical servers using Solr 4.3.0, each with 24 separate 
 tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a

 solr shard in each. This configuration means that each shard has its 
 own data directory declared. (Server OS, tomcat and solr, including 
 shards, created via automated builds.)



 That is, for example,

 - tomcat instance, /var/local/tomcat/solrshard3/, port 8985

 - corresponding solr instance, /usr/local/solrshard3/, with 
 /usr/local/solrshard3/collection1/conf/solrconfig.xml

 - corresponding solr data directory,
 /var/local/solrshard3/collection1/data/



 We process ~1.5 billion documents, which is why we use so 48 shards 
 (24 leaders, 24 replicas). These physical servers are rebooted 
 regularly to fsck their drives. When rebooted, we always see several 
 (~10-20) shards failing to start (UI cloud view shows them as 'Down'
or 'Recovering'
 though they never recover without intervention), though there is not a

 pattern to which shards fail to start - we haven't recorded any that 
 always or never fail. On inspection, the UI dashboard for these failed

 shards displays, for example:

 - HostServer1

 - Instance/usr/local/sholrshard3/collection1

 - Data/var/local/solrshard6/collection1/data

 - Index

Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Steven Bower

Looking at the timestamps on the tlog files they seem to have all been
created around the same time (04:55).. starting around this time I start
seeing the exception below (there were 1628).. in fact its getting tons of
these (200k+) but most of the time inside regular commits...

2013-15-05 04:55:06.634 ERROR UpdateLog [recoveryExecutor-6-thread-7922] -
java.lang.ArrayIndexOutOfBoundsException: 2603
at
org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:146)
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:492)
at
org.apache.lucene.index.BufferedDeletesStream.applyTermDeletes(BufferedDeletesStream.java:407)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:273)
at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2973)
at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2964)
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2704)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2839)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2819)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
at
org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1339)
at
org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1163)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



On Thu, May 16, 2013 at 9:35 AM, Yonik Seeley yo...@lucidworks.com wrote:

 See https://issues.apache.org/jira/browse/SOLR-3939

 Do you see these log messages from this in your logs?
   log.info(I may be the new leader - try and sync);

 How reproducible is this bug for you?  It would be great to know if
 the patch in the issue fixes things.

 -Yonik
 http://lucidworks.com


 On Wed, May 15, 2013 at 6:04 PM, Steven Bower sbo...@alcyon.net wrote:
  They are visible to ls...
 
 
  On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com
 wrote:
 
  On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net
 wrote:
   when the TransactionLog objects are dereferenced
   their RandomAccessFile object is not closed..
 
  Have the files been deleted (unlinked from the directory), or are they
  still visible via ls?
 
  -Yonik
  http://lucidworks.com

Re: Oracle Timestamp in SOLR

2013-05-16 Thread Peter Sch�tt

Hallo, 
 
: I have a field with the type TIMESTAMP(6) in an oracle view.
  ...
: What is the best way to import it?
  ...
: This way works but I do not know if this is the best practise:
  ... 
:  TO_CHAR(LAST_ACTION_TIMESTAMP, '-MM-DD HH24:MI:SS') as
:  LAT 
 
 instead of having your DB convert to a string, and then forcing DIH to
 parse that string, try asking your DB to cast to something that JDBC
 will respect as a Date object when DIH fetches the results
 
 I don't know much about oracle, but perhaps something like...
 
  SELECT ... CAST(LAST_ACTION_TIMESTAMP AS DATE) AS LAT

This removes the time part of the timestamp in SOLR. althought it is shown 
in PL/SQL-Developer (Tool for Oracle).

The only way I found in the net is to write an own converter :-(

Thanks in advance for any other hints.

Ciao
  Peter Schütt

Explicite update or delete of a dataset

2013-05-16 Thread Peter Sch�tt

Hallo,
how can I update or delete a single dataset by a given ID?

Thanks for any hint.

Ciao
  Peter Schütt

Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Steven Bower

Created https://issues.apache.org/jira/browse/SOLR-4831 to capture this
issue


On Thu, May 16, 2013 at 10:10 AM, Steven Bower sbo...@alcyon.net wrote:

 Looking at the timestamps on the tlog files they seem to have all been
 created around the same time (04:55).. starting around this time I start
 seeing the exception below (there were 1628).. in fact its getting tons of
 these (200k+) but most of the time inside regular commits...

 2013-15-05 04:55:06.634 ERROR UpdateLog [recoveryExecutor-6-thread-7922] -
 java.lang.ArrayIndexOutOfBoundsException: 2603
 at
 org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:146)
 at
 org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:492)
 at
 org.apache.lucene.index.BufferedDeletesStream.applyTermDeletes(BufferedDeletesStream.java:407)
 at
 org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:273)
 at
 org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2973)
 at
 org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2964)
 at
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2704)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2839)
 at
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2819)
 at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
 at
 org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1339)
 at
 org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1163)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)



 On Thu, May 16, 2013 at 9:35 AM, Yonik Seeley yo...@lucidworks.comwrote:

 See https://issues.apache.org/jira/browse/SOLR-3939

 Do you see these log messages from this in your logs?
   log.info(I may be the new leader - try and sync);

 How reproducible is this bug for you?  It would be great to know if
 the patch in the issue fixes things.

 -Yonik
 http://lucidworks.com


 On Wed, May 15, 2013 at 6:04 PM, Steven Bower sbo...@alcyon.net wrote:
  They are visible to ls...
 
 
  On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com
 wrote:
 
  On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net
 wrote:
   when the TransactionLog objects are dereferenced
   their RandomAccessFile object is not closed..
 
  Have the files been deleted (unlinked from the directory), or are they
  still visible via ls?
 
  -Yonik
  http://lucidworks.com

Re: Lucene-Solr indexing document via Post method

Have you completed the Solr tutorial yet? If so, please ask a more specific 
question so we can understand what your problem is.


http://lucene.apache.org/solr/tutorial.html

-- Jack Krupansky

-Original Message- 
From: Rider Carrion Cleger

Sent: Thursday, May 16, 2013 6:43 AM
To: solr-user@lucene.apache.org
Subject: Lucene-Solr indexing document via Post method

Hi guys,

I'm trying to run solr with apache tomcat. So, It's possible to indexing
documents via Post method, using Lucene-Solr ? Which is the correct
way to index
documents in Solr ?

thanks

Re: Explicite update or delete of a dataset


Update is the same as add in Solr.

To delete:

curl http://localhost:8983/solr/update?commit=true \
-H 'Content-type:application/json' \
-d '{delete: {id:doc-0001}}'

-- Jack Krupansky

-Original Message- 
From: Peter Schütt

Sent: Thursday, May 16, 2013 10:27 AM
To: solr-user@lucene.apache.org
Subject: Explicite update or delete of a dataset

Hallo,
how can I update or delete a single dataset by a given ID?

Thanks for any hint.

Ciao
 Peter Schütt

What is 503 Status For Admin Ping

I have made some little changes at example folder of Solr 4.2.1 When I
start up it just with:

java -jar start.jar

I get that status:

INFO: [collection1] webapp=/solr path=/admin/ping
params={action=status_=1368715926563wt=json} status=503 QTime=0


When I click ping at (just once time )admin page I get that:

May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/admin/ping
params={action=status_=1368715926563wt=json} status=503 QTime=0
May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/admin/file/
params={file=admin-extra.html_=1368715926560} status=0 QTime=0
May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/admin/ping
params={ts=1368715928213_=1368715928214wt=json} hits=0 status=0 QTime=1
May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/admin/ping
params={ts=1368715928213_=1368715928214wt=json} status=0 QTime=3

What is that status 503 (If it is HTTP 503 why it is listed as INFO)?

RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

 The dataDir is set in each solrconfig.xml; each one has been checked to
 ensure it points to its corresponding location. The error we see is that
 on machine reboot not all of the shards start successfully, and if the
 fail was to be a leader the replicas can't take its place (presumably
 because the leader incorrect data directory is inconsistent with their
 own).

Although you can set the dataDir in solrconfig.xml, I would strongly
recommend that you don't.

If you are using the old-style solr.xml (which has cores and core tags)
then set the dataDir in each core tag in solr.xml. This gets read and set
before the core is created, so there's less chance of it getting
scrambled. The solrconfig is read as part of core creation.

If you are using the new style solr.xml (new with 4.3.0) then you'll need
absolute dataDir paths, and they need to go in each core.properties file.
Due to a bug, relative paths won't work as expected. I need to see if I
can make sure the fix makes it into 4.3.1.

If moving dataDir out of solrconfig.xml fixes it, then we probably have a
bug.

Yout Zookeeper problems might be helped by increasing zkClientTimeout.

Thanks,
Shawn

Re: What is 503 Status For Admin Ping

2013-05-16 Thread Yago Riveiro

Probably one or more shards of your collection are not available at ping 
operation time and the server returns the 503 code

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, May 16, 2013 at 3:58 PM, Furkan KAMACI wrote:

 I have made some little changes at example folder of Solr 4.2.1 When I
 start up it just with:
 
 java -jar start.jar
 
 I get that status:
 
 INFO: [collection1] webapp=/solr path=/admin/ping
 params={action=status_=1368715926563wt=json} status=503 QTime=0
 
 
 When I click ping at (just once time )admin page I get that:
 
 May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute
 INFO: [collection1] webapp=/solr path=/admin/ping
 params={action=status_=1368715926563wt=json} status=503 QTime=0
 May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute
 INFO: [collection1] webapp=/solr path=/admin/file/
 params={file=admin-extra.html_=1368715926560} status=0 QTime=0
 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute
 INFO: [collection1] webapp=/solr path=/admin/ping
 params={ts=1368715928213_=1368715928214wt=json} hits=0 status=0 QTime=1
 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute
 INFO: [collection1] webapp=/solr path=/admin/ping
 params={ts=1368715928213_=1368715928214wt=json} status=0 QTime=3
 
 What is that status 503 (If it is HTTP 503 why it is listed as INFO)?

Re: What is 503 Status For Admin Ping

It is a single node, started as standalone. I have just started a Solr
instance without SolrCloud.

2013/5/16 Yago Riveiro yago.rive...@gmail.com

 Probably one or more shards of your collection are not available at ping
 operation time and the server returns the 503 code

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Thursday, May 16, 2013 at 3:58 PM, Furkan KAMACI wrote:

  I have made some little changes at example folder of Solr 4.2.1 When I
  start up it just with:
 
  java -jar start.jar
 
  I get that status:
 
  INFO: [collection1] webapp=/solr path=/admin/ping
  params={action=status_=1368715926563wt=json} status=503 QTime=0
 
 
  When I click ping at (just once time )admin page I get that:
 
  May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute
  INFO: [collection1] webapp=/solr path=/admin/ping
  params={action=status_=1368715926563wt=json} status=503 QTime=0
  May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute
  INFO: [collection1] webapp=/solr path=/admin/file/
  params={file=admin-extra.html_=1368715926560} status=0 QTime=0
  May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute
  INFO: [collection1] webapp=/solr path=/admin/ping
  params={ts=1368715928213_=1368715928214wt=json} hits=0 status=0 QTime=1
  May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute
  INFO: [collection1] webapp=/solr path=/admin/ping
  params={ts=1368715928213_=1368715928214wt=json} status=0 QTime=3
 
  What is that status 503 (If it is HTTP 503 why it is listed as INFO)?

Re: error while switching from log4j back to slf4j with solr 4.3


On 5/16/2013 3:24 AM, Bernd Fehling wrote:

OK, solved.
I have now run-jetty-run with log4j running.
Just copied log4j libs from example/lib/ext to webapp/WEB-INF/classes and
set -Dlog4j.configuration in run-jetty-run VM classpath.


The location where you copied those files is in the extracted .war file, 
and may get automatically wiped out at some point in the future, 
especially by an upgrade.  It would be better to copy them to the 
external lib directory for your container.  For jetty, that's lib/ext 
... it is likely to be different for other containers.


Thanks,
Shawn

Zookeeper Ensemble Startup Parameters For SolrCloud?

I know that there have been many conversations about SolrCloud startup tips
i.e. which type of garbage collector to use etc. Also I know that  there is
no an exact answer for this question. However I think that folks have some
tips about this question.

How do you start up your external Zookeeper, with which parameters and any
tips for it?

Ho to adjust maxDocs and maxTime for autoCommit?

I will start my pre-production step soon. How can I adjust maxDocs
and maxTime for autoCommit? What do you suggest for me to adjust that
parameters?

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?


On 5/16/2013 9:25 AM, Furkan KAMACI wrote:

I know that there have been many conversations about SolrCloud startup tips
i.e. which type of garbage collector to use etc. Also I know that  there is
no an exact answer for this question. However I think that folks have some
tips about this question.

How do you start up your external Zookeeper, with which parameters and any
tips for it?


An external zookeeper is just that - external, not part of Solr.  I 
followed the zookeeper docs, and used the normal zookeeper port, 2181:


http://zookeeper.apache.org/doc/r3.4.5/

Thanks,
Shawn

Re: Strange fuzzy behavior in 4.2.1

2013-05-16 Thread Ryan Wilson

This might explain why our dev database of 400,000 records doesn't seem to
suffer from this.  When we started seeing this in our test environment of
300,000,000 records, we thought we just weren't finding records in dev that
were having the problem.

One thing that this does not explain is that we have located a few terms
that find nothing but the original term, despite having possible matches
one edit away. For example, albert will not find anything but albert,
despite there being alberta, albart, etc. I am reading into the
maxExpansion variable and how it functions as I am writing this, so I might
be missing the connection.

I note that you say this is a hardcoded behavior. Would I be safe in
assuming that I will need to build a custom solr.war to make changes to
this setting? I wan to see if sliding this number up/down will let me
confirm that it is indeed maxExpansions that is the problem.

Finally, if it is maxExpansions that is the problem is there any solution
beyond the aforementioned custom war?

-Ryan Wilson
On Thu, May 16, 2013 at 8:40 AM, Jack Krupansky j...@basetechnology.comwrote:

 Maybe you are running into the same problem I posted on another message
 thread about the hard-coded maxExpansions limit of 50. In other words, once
 Lucene finds 50 terms that do match, it won't find the additional matches.
 And that is not necessarily the top 50, but the first 50 in the index.

 See if you can reproduce the problem with a small data set of no more than
 a couple dozen documents.

 -- Jack Krupansky
 -Original Message- From: Ryan Wilson
 Sent: Thursday, May 16, 2013 9:28 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Strange fuzzy behavior in 4.2.1


 In answering your first questions, any changes we’ve been making have been
 followed by a reindex.



 The data that is being indexed generally looks something like this (space
 indicating an actual space):



 TIM space , space JULIO

 JULIE space , space JIM



 So based off what we see from looking at top terms in the field and the
 analysis tool, at index time these records are being broken up such that
 TIM , JULIO can be found with tim or Julio.



 Just to make sure I’m not misunderstanding something about Solr/Lucene,
 when a record is indexed the index analysis chain result (tim ,
 julio) is what is written to disk correct? So far as I understand it it’s
 the query analysis chain that has the issue with most filters not being
 applied during wildcard and fuzzy queries.



 Finally, some clarification as I’ve realized my original email might not
 have made this point well. I can have a particular record with a primary
 key of X and a name value of LEWIS , JULIA and be able to find that exact
 record with bulia~1 but not aulia~1,   or GUERRERO , JULIAN , JULIAN can be
 found with julan~1 but not julia~1. It’s not that records go missing when
 searched for with fuzzy, but rather the  fuzzy terms that will find them
 seem, to my eyes, inconsistent.



 Regards,

 Ryan Wilson
 rpwils...@gmail.com

Re: Apache solr error


On 5/16/2013 6:30 AM, Nilesh Gaikwad wrote:

Number of documents in index: 0

Number of pending deletions: 0

The search index is generated by running cron #12. *0%* of the site
content has been sent to the server. There are 3587 items left to send.

But as I can see, there should be around 4 documents in index, but it
is not showing correct one


Do you see any errors in your Solr server logs? Do you see anything in 
the logs for your application that gets called from cron?


Thanks,
Shawn

RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil

Thanks for your response Shawn, very much appreciated.
Gil

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 16 May 2013 15:59
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.3.0: Shard instances using incorrect data directory
on machine boot

 The dataDir is set in each solrconfig.xml; each one has been checked 
 to ensure it points to its corresponding location. The error we see is

 that on machine reboot not all of the shards start successfully, and 
 if the fail was to be a leader the replicas can't take its place 
 (presumably because the leader incorrect data directory is 
 inconsistent with their own).

Although you can set the dataDir in solrconfig.xml, I would strongly
recommend that you don't.

If you are using the old-style solr.xml (which has cores and core tags)
then set the dataDir in each core tag in solr.xml. This gets read and
set before the core is created, so there's less chance of it getting
scrambled. The solrconfig is read as part of core creation.

If you are using the new style solr.xml (new with 4.3.0) then you'll
need absolute dataDir paths, and they need to go in each core.properties
file.
Due to a bug, relative paths won't work as expected. I need to see if I
can make sure the fix makes it into 4.3.1.

If moving dataDir out of solrconfig.xml fixes it, then we probably have
a bug.

Yout Zookeeper problems might be helped by increasing zkClientTimeout.

Thanks,
Shawn

Re: Ho to adjust maxDocs and maxTime for autoCommit?


On 5/16/2013 9:36 AM, Furkan KAMACI wrote:

I will start my pre-production step soon. How can I adjust maxDocs
and maxTime for autoCommit? What do you suggest for me to adjust that
parameters?


Change the numbers for those settings in your solrconfig.xml.  Look at 
the example solrconfig.xml.


The example solrconfig.xml file has this, commented out.  A minute with 
google would have also answered this question.  Using only the things 
you asked about:


http://lmgtfy.com/?q=solr+autocommit+maxtime+maxdocs

The third hit is a Solr wiki article and contains an example update 
handler with the settings you need.


Thanks,
Shawn

Re: What is 503 Status For Admin Ping


On 5/16/2013 9:18 AM, Furkan KAMACI wrote:

It is a single node, started as standalone. I have just started a Solr
instance without SolrCloud.

2013/5/16 Yago Riveiro yago.rive...@gmail.com


Probably one or more shards of your collection are not available at ping
operation time and the server returns the 503 code

--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, May 16, 2013 at 3:58 PM, Furkan KAMACI wrote:


I have made some little changes at example folder of Solr 4.2.1 When I
start up it just with:

java -jar start.jar

I get that status:

INFO: [collection1] webapp=/solr path=/admin/ping
params={action=status_=1368715926563wt=json} status=503 QTime=0


When a ping request failed, older Solr versions logged a huge java 
stacktrace and an error, and most of the time that information was not 
very helpful.


Can you share your ping handler definition?  I would guess that the 
query in your ping handler is failing, or that you have a 
healthcheckFile configured and it doesn't exist, so you would need to 
enable it.


http://server:port/solr/corename/admin/ping?action=enable

Here's how one of my ping handlers is set up:

requestHandler name=/admin/ping class=solr.PingRequestHandler
  lst name=invariants
str name=qt/lbcheck/str
str name=q*:*/str
str name=dfBody/str
  /lst
  lst name=defaults
 str name=echoParamsall/str
  /lst
  str name=healthcheckFileserver-enabled.txt/str
/requestHandler

When the ping handler is called, it sends a query for all docs to a 
search handler named /lbcheck (load balancer check), with a default 
field of Body.  The healthcheckFile is relative to dataDir.  The enable 
action creates this file, and the disable action deletes the file.


Thanks,
Shawn

Speed up import of Hierarchical Data

2013-05-16 Thread O. Olson

I am using the DataImportHandler to Query a SQL Server and populate Solr.
Unfortunately, SQL does not have an understanding of hierarchical
relationships, and hence I use Table Joins. The following is an outline of
my table structure: 


PROD_TABLE
- SKU (Primary Key)
- Title  (varchar)
- Descr (varchar)

CAT_TABLE
- SKU (Foreign Key)
-  CategoryLevel (int i.e. 1, 2, 3 …)
- CategoryName  (varchar)

I specify the SQL Query in the db-data-config.xml file – a snippet of which
looks like: 

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost\/
document
entity name=Product 
query=SELECT SKU, Title, Descr FROM 
PROD_TABLE
field column=SKU name=SKU /
field column=Title name=Title /
field column=Descr name=Descr /

entity name=Cat1  
query=SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=1
field column=CategoryName name=Category1 
/ 
/entity
entity name=Cat2  
query=SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=2
field column=CategoryName name=Category2 
/ 
/entity
entity name=Cat3  
query=SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=3
field column=CategoryName name=Category3 
/ 
/entity

/entity
/document
/dataConfig

It seems like the DataImportHandler handler sends out three or four queries
for each Product. This results in a very slow import. Is there any way to
speed this up? I would not mind an intermediate step of first extracting SQL
and then putting it into Solr.

Thank you for all your help. 
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Ho to adjust maxDocs and maxTime for autoCommit?

Unless you have a specific reason to change the settings (any settings), the 
general recommendation is to leave them as is. That's not to say that these 
are the best settings or optimal for all situations, but simply that they 
are all considered to be reasonable.


If you or anybody else has good reason to believe that any of the solrconfig 
settings for any feature are unreasonable, please file a Jira with suggested 
improvements.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Thursday, May 16, 2013 11:36 AM
To: solr-user@lucene.apache.org
Subject: Ho to adjust maxDocs and maxTime for autoCommit?

I will start my pre-production step soon. How can I adjust maxDocs
and maxTime for autoCommit? What do you suggest for me to adjust that
parameters?

Re: Speed up import of Hierarchical Data

2013-05-16 Thread Stefan Matheis

That sounds like a perfect match for 
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor :)

On Thursday, May 16, 2013 at 6:01 PM, O. Olson wrote:

 I am using the DataImportHandler to Query a SQL Server and populate Solr.
 Unfortunately, SQL does not have an understanding of hierarchical
 relationships, and hence I use Table Joins. The following is an outline of
 my table structure:  
  
  
 PROD_TABLE
 - SKU (Primary Key)
 - Title (varchar)
 - Descr (varchar)
  
 CAT_TABLE
 - SKU (Foreign Key)
 - CategoryLevel (int i.e. 1, 2, 3 …)
 - CategoryName (varchar)
  
 I specify the SQL Query in the db-data-config.xml file – a snippet of which
 looks like:  
  
 dataConfig
 dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://localhost\/
 document
 entity name=Product  
 query=SELECT SKU, Title, Descr FROM PROD_TABLE
 field column=SKU name=SKU /
 field column=Title name=Title /
 field column=Descr name=Descr /
  
 entity name=Cat1  
 query=SELECT CategoryName from CAT_TABLE where
 SKU='${Product.SKU}' AND CategoryLevel=1
 field column=CategoryName name=Category1 /  
 /entity
 entity name=Cat2  
 query=SELECT CategoryName from CAT_TABLE where
 SKU='${Product.SKU}' AND CategoryLevel=2
 field column=CategoryName name=Category2 /  
 /entity
 entity name=Cat3  
 query=SELECT CategoryName from CAT_TABLE where
 SKU='${Product.SKU}' AND CategoryLevel=3
 field column=CategoryName name=Category3 /  
 /entity
  
 /entity
 /document
 /dataConfig
  
 It seems like the DataImportHandler handler sends out three or four queries
 for each Product. This results in a very slow import. Is there any way to
 speed this up? I would not mind an intermediate step of first extracting SQL
 and then putting it into Solr.
  
 Thank you for all your help.  
 O. O.
  
  
  
  
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:
fieldType name=text_wc class=solr.TextField positionIncrementGap=100

 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in debug
section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote:

 You haven't indicated any problem here! What is the symptom that you
 actually think is a problem.

 There is no comma operator in any of the Solr query parsers. Comma is just
 another character that may or may not be included or discarded depending on
 the specific field type and analyzer. For example, a white space analyzer
 will keep commas, but the standard analyzer or the word delimiter filter
 will discard them. If title were a string type, all punctuation would
 be preserved, including commas and spaces (but spaces would need to be
 escaped or the term text enclosed in parentheses.)

 Let us know what your symptom is though, first.

 I mean, the filter query looks perfectly reasonable from an abstract
 perspective.

 -- Jack Krupansky

 -Original Message- From: Sandeep Mestry
 Sent: Thursday, May 16, 2013 6:51 AM
 To: solr-user@lucene.apache.org
 Subject: Question about Edismax - Solr 4.0

 -- *Edismax and Filter Queries with Commas and spaces* --


 Dear Experts,

 This appears to be a bug, please suggest if I'm wrong.

 If I search with the following filter query,

 1) fq=title:(, 10)

 - I get no results.
 - The debug output does NOT show the section containing
 parsed_filter_queries

 if I carry a search with the filter query,

 2) fq=title:(,10) - (No space between , and 10)

 - I get results and the debug output shows the parsed filter queries
 section as,
 arr name=filter_queries
 str(titles:(,10))/str
 str(collection:assets)/str

 As you can see above, I'm also passing in other filter queries
 (collection:assets) which appear correctly but they do not appear in case 1
 above.

 I can't make this as part of the query parameter as that needs to be
 searched against multiple fields.

 Can someone suggest a fix in this case please. I'm using Solr 4.0.

 Many Thanks,
 Sandeep

SOLR test framework- ERROR: SolrIndexSearcher opens=1 closes=0

2013-05-16 Thread bbarani

I am using SOLR 4.3.0, I have created multiple custom components.

I am getting the below error when I run tests (using SOLR 4.3 test
framework) against one of the custom componentAll the tests pass but I
still get the below error once test gets completed. Can someone help me
resolve this error?

java.lang.AssertionError: ERROR: SolrIndexSearcher opens=1 closes=0
at __randomizedtesting.SeedInfo.seed([C2DCAC50C9ACBACE]:0)
at org.junit.Assert.fail(Assert.java:93)
at
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:252)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:700)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:680)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-test-framework-ERROR-SolrIndexSearcher-opens-1-closes-0-tp4063940.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR test framework- ERROR: SolrIndexSearcher opens=1 closes=0


On 5/16/2013 10:46 AM, bbarani wrote:

I am using SOLR 4.3.0, I have created multiple custom components.

I am getting the below error when I run tests (using SOLR 4.3 test
framework) against one of the custom componentAll the tests pass but I
still get the below error once test gets completed. Can someone help me
resolve this error?

java.lang.AssertionError: ERROR: SolrIndexSearcher opens=1 closes=0


It looks like you opened a searcher object as part of your test but then 
didn't close it.  If you didn't do this in the test itself, perhaps it's 
happening in your custom component.


I'm a little fuzzy on test writing, though.

Thanks.
Shawn

Re: Oracle Timestamp in SOLR

2013-05-16 Thread Chris Hostetter


:   SELECT ... CAST(LAST_ACTION_TIMESTAMP AS DATE) AS LAT
: 
: This removes the time part of the timestamp in SOLR. althought it is shown 
: in PL/SQL-Developer (Tool for Oracle).

Hmmm... that makes no sense to me based on 10 seconds of googling...

http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i1847

The DATE datatype stores the year (including the century), the month, the 
day, the hours, the minutes, and the seconds

...but i'll take your word for it.  

: The only way I found in the net is to write an own converter :-(

There must be *some* way to either tweak your SQL or tweak your JDBC 
connection properties such that Oracle's JDBC driver will give you a 
legitimate java.sql.Date or java.sql.Timestamp instead of it's own 
internal class (that doesn't extend java.util.Date) ... otherwise it's 
just total freaking anarchy.



-Hoss

Re: Question about Edismax - Solr 4.0

Could you show us the full query URL - spaces must be encoded in URL query 
parameters.


Also show the actual field XML - you omitted that.

Try the same query as a main query, using both defType=edismax and 
defType=lucene.


Note that the filter query is parsed using the Lucene query parser, not 
edismax, independent of the defType parameter. But you don't have any 
edismax features in your fq anyway.


But you can stick {!edismax} in front of the query to force edismax to be 
used for the fq, although it really shouldn't change anything:


Also, catenate is fine for indexing, but will mess up your queries at query 
time, so set them to 0 in the query analyzer


Also, make sure you have autoGeneratePhraseQueries=true on the field type, 
but that's not the issue here.


-- Jack Krupansky

-Original Message- 
From: Sandeep Mestry

Sent: Thursday, May 16, 2013 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about Edismax - Solr 4.0

Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:
fieldType name=text_wc class=solr.TextField positionIncrementGap=100



analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=0 preserveOriginal=1 /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
   /fieldType

I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in debug
section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote:


You haven't indicated any problem here! What is the symptom that you
actually think is a problem.

There is no comma operator in any of the Solr query parsers. Comma is just
another character that may or may not be included or discarded depending 
on

the specific field type and analyzer. For example, a white space analyzer
will keep commas, but the standard analyzer or the word delimiter filter
will discard them. If title were a string type, all punctuation would
be preserved, including commas and spaces (but spaces would need to be
escaped or the term text enclosed in parentheses.)

Let us know what your symptom is though, first.

I mean, the filter query looks perfectly reasonable from an abstract
perspective.

-- Jack Krupansky

-Original Message- From: Sandeep Mestry
Sent: Thursday, May 16, 2013 6:51 AM
To: solr-user@lucene.apache.org
Subject: Question about Edismax - Solr 4.0

-- *Edismax and Filter Queries with Commas and spaces* --


Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,
arr name=filter_queries
str(titles:(,10))/str
str(collection:assets)/str

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 
1

above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

RE: Speed up import of Hierarchical Data

2013-05-16 Thread Dyer, James

See https://issues.apache.org/jira/browse/SOLR-2943 .  You can set up 2 DIH 
handlers.  The first would query the CAT_TABLE and save it to a disk-backed 
cache, using DIHCacheWriter.  You then would replace your 3 child entities in 
the 2nd DIH handler to use DIHCacheProcessor to read back the cached data.  
This is a little complicated to do, but it would let you just cache the data 
once and because it is disk-backed, will scale to whatever size the CAT_TABLE 
is.  (For some details, see this thread: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tt4015514.html)

A simpler method is simply to specify cacheImpl=SortedMapBackedCache on the 3 
child entities.  (This is the same as using CachedSqlEntityProcessor.)  It 
would generate 3 in-memory caches, each with the same data.  If CAT_TABLE is 
small, this would be adequate.  

In between this would be to create a disk-backed cache Impl (or use the ones at 
SOLR-2613 or SOLR-2948) and specify it on cacheImpl.  It would still create 3 
identical caches, but they would be disk-backed and could scale beyond what 
in-memory can handle.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: O. Olson [mailto:olson_...@yahoo.it] 
Sent: Thursday, May 16, 2013 11:01 AM
To: solr-user@lucene.apache.org
Subject: Speed up import of Hierarchical Data

I am using the DataImportHandler to Query a SQL Server and populate Solr.
Unfortunately, SQL does not have an understanding of hierarchical
relationships, and hence I use Table Joins. The following is an outline of
my table structure: 


PROD_TABLE
- SKU (Primary Key)
- Title  (varchar)
- Descr (varchar)

CAT_TABLE
- SKU (Foreign Key)
-  CategoryLevel (int i.e. 1, 2, 3 …)
- CategoryName  (varchar)

I specify the SQL Query in the db-data-config.xml file – a snippet of which
looks like: 

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost\/
document
entity name=Product 
query=SELECT SKU, Title, Descr FROM 
PROD_TABLE
field column=SKU name=SKU /
field column=Title name=Title /
field column=Descr name=Descr /

entity name=Cat1  
query=SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=1
field column=CategoryName name=Category1 
/ 
/entity
entity name=Cat2  
query=SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=2
field column=CategoryName name=Category2 
/ 
/entity
entity name=Cat3  
query=SELECT CategoryName from CAT_TABLE where
SKU='${Product.SKU}' AND CategoryLevel=3
field column=CategoryName name=Category3 
/ 
/entity

/entity
/document
/dataConfig

It seems like the DataImportHandler handler sends out three or four queries
for each Product. This results in a very slow import. Is there any way to
speed this up? I would not mind an intermediate step of first extracting SQL
and then putting it into Solr.

Thank you for all your help. 
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange fuzzy behavior in 4.2.1

Go ahead and file a Jira and hopefully that will attract some committer 
attention that might shed some more light.


Beyond that, sure you can build Solr yourself and change the query parser 
code to put a larger number in for maxExpansion.


You might also try developing a test case, say 100 small test documents with 
similar values and see if the 50 limit seems to account for behavior that 
you see with that test dataset.


-- Jack Krupansky

-Original Message- 
From: Ryan Wilson

Sent: Thursday, May 16, 2013 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Strange fuzzy behavior in 4.2.1

This might explain why our dev database of 400,000 records doesn't seem to
suffer from this.  When we started seeing this in our test environment of
300,000,000 records, we thought we just weren't finding records in dev that
were having the problem.

One thing that this does not explain is that we have located a few terms
that find nothing but the original term, despite having possible matches
one edit away. For example, albert will not find anything but albert,
despite there being alberta, albart, etc. I am reading into the
maxExpansion variable and how it functions as I am writing this, so I might
be missing the connection.

I note that you say this is a hardcoded behavior. Would I be safe in
assuming that I will need to build a custom solr.war to make changes to
this setting? I wan to see if sliding this number up/down will let me
confirm that it is indeed maxExpansions that is the problem.

Finally, if it is maxExpansions that is the problem is there any solution
beyond the aforementioned custom war?

-Ryan Wilson
On Thu, May 16, 2013 at 8:40 AM, Jack Krupansky 
j...@basetechnology.comwrote:



Maybe you are running into the same problem I posted on another message
thread about the hard-coded maxExpansions limit of 50. In other words, 
once

Lucene finds 50 terms that do match, it won't find the additional matches.
And that is not necessarily the top 50, but the first 50 in the index.

See if you can reproduce the problem with a small data set of no more than
a couple dozen documents.

-- Jack Krupansky
-Original Message- From: Ryan Wilson
Sent: Thursday, May 16, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Strange fuzzy behavior in 4.2.1


In answering your first questions, any changes we’ve been making have been
followed by a reindex.



The data that is being indexed generally looks something like this 
(space

indicating an actual space):



TIM space , space JULIO

JULIE space , space JIM



So based off what we see from looking at top terms in the field and the
analysis tool, at index time these records are being broken up such that
TIM , JULIO can be found with tim or Julio.



Just to make sure I’m not misunderstanding something about Solr/Lucene,
when a record is indexed the index analysis chain result (tim ,
julio) is what is written to disk correct? So far as I understand it it’s
the query analysis chain that has the issue with most filters not being
applied during wildcard and fuzzy queries.



Finally, some clarification as I’ve realized my original email might not
have made this point well. I can have a particular record with a primary
key of X and a name value of LEWIS , JULIA and be able to find that exact
record with bulia~1 but not aulia~1,   or GUERRERO , JULIAN , JULIAN can 
be

found with julan~1 but not julia~1. It’s not that records go missing when
searched for with fuzzy, but rather the  fuzzy terms that will find them
seem, to my eyes, inconsistent.



Regards,

Ryan Wilson
rpwils...@gmail.com

Re: Deleting an entry from a collection when they key has : in it

You need to escape colons in queries, using either a backslash or enclosing 
the full query term in quotes.


In your case, you have backslashes as well in your query, which the query 
parser will interpret as an escape! So, you need to escape those backslashes 
as well:


D\:\\somedir\\somefile.pdf

or

D:\\somedir\\somefile.pdf

-- Jack Krupansky

-Original Message- 
From: Daniel Baughman

Sent: Thursday, May 16, 2013 11:33 AM
To: solr-user@lucene.apache.org
Subject: Deleting an entry from a collection when they key has : in it

Hi All,



I seem to be really struggling to delete an entry from  a search repository
that has a : in the key.



The key is path to the file ie, D:\somedir\somefile.pdf.



I want to use a query to delete it and I just can't seem to make it go away.



I've been trying stuff lke this:

http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery%
3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplinar
y\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3E
http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery
%3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplina
ry\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3Eversion=2.2start=0rows=10
indent=on version=2.2start=0rows=10indent=on



It doesn't throw an error but it doesn't delete the document either.



Does anyone have any suggestions?



Thanks,

Dan

Re: Facets referenced by key

2013-05-16 Thread Chris Hostetter


: I would then like to refer to these 'pseudo' field later in the request
: string. I thought this would be how I'd do it:
: 
: f.my_facet_key.facet.prefix=a_given_prefix
...


that syntax was proposed in SOLR-1351 and a patch was made available, but 
it was never commited (it only supported a subset of faceting, needed more 
tests, and had unclear behavior about how the defaults where picked if 
you combined f.key.facet.foo + f.field.facet.foo + facet.foo)

: I thought this would work, however it doesn't appear to. What does work is
: if I define the prefix and mincount in the local params:
: 
: facet.field={!ex=dt key=my_facet_key 
facet.prefix=a_given_prefix}the_facet_field

Correct, SOLR-4717 added support to Solr 4.3 for specifying all of the 
facet options as local params such that that syntax would work.  Given th 
way the use of Solr and localparams have evolved over the years it was 
considered a more natural and logical way to specify facet option on a per 
field or per key basis.

: Is this expected? I'm also using sunspot and they construct the queries
: with keys as in my first example, i.e. facet.field={!ex=dt
: key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix

I can't comment on that ... i'm not sure why sunspot would assume that 
behavior would work (unless someone looked at SOLR-1351 once upon a time 
and assumed that would definitely be official at some point)

-Hoss

RE: Deleting an entry from a collection when they key has : in it

2013-05-16 Thread Daniel Baughman

Thanks for the idea
http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery%
3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplinar
y\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3E

I do have :'s and  \'s escaped, I believe.

If in my schema, I have the key field set to indexed=false, then is that
maybe the issue?  I'm going to try to set that to true and rebuild the
repository and see if that does it.


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Thursday, May 16, 2013 11:20 AM
To: solr-user@lucene.apache.org
Subject: Re: Deleting an entry from a collection when they key has : in it

You need to escape colons in queries, using either a backslash or enclosing
the full query term in quotes.

In your case, you have backslashes as well in your query, which the query
parser will interpret as an escape! So, you need to escape those backslashes
as well:

D\:\\somedir\\somefile.pdf

or

D:\\somedir\\somefile.pdf

-- Jack Krupansky

-Original Message-
From: Daniel Baughman
Sent: Thursday, May 16, 2013 11:33 AM
To: solr-user@lucene.apache.org
Subject: Deleting an entry from a collection when they key has : in it

Hi All,



I seem to be really struggling to delete an entry from  a search repository
that has a : in the key.



The key is path to the file ie, D:\somedir\somefile.pdf.



I want to use a query to delete it and I just can't seem to make it go away.



I've been trying stuff lke this:

http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery%
3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplinar
y\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3E
http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery
%3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplina
ry\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3Eversion=2.2start=0rows=10
indent=on version=2.2start=0rows=10indent=on



It doesn't throw an error but it doesn't delete the document either.



Does anyone have any suggestions?



Thanks,

Dan

Migrating from 4.2.1 to 4.3.0

Greetings, I just started with Solr a couple weeks ago, with version 4.2.1.

I installed the following setup:
- ZooKeeper: 3 instances ensemble
- Solr: on Tomcat, 4 instances
    - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica

    - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica

    - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica


With version 4.2.1 everything works fine.  But I do have a problem if I query 
instance 3 for something in the WebOrder_Collection.  I found that this is a 
bug in 4.2.1,. I must query instances 1 or 2 to get results from 
WebOrder_Collection.


Now that I have upgraded to 4.3.0 I have the following problem.  My replicas 
will not recover.  The recovery will retry, and retry, ... forever.

Details.  If I look at the Zoo, I see that:
 - node_name
10.0.2.15:8180_solr    in solr 4.2.1
10.0.2.15:8180_ in solr 4.3.0
 - base_url
            http://10.0.2.15:8180/solr  in solr 4.2.1

            http://10.0.2.15:8180    in solr 4.3.0

My solr logs show this:

8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  – Error 
while trying to recover. 
core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found
    at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
    at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
    at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
    at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
    at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)


I have not been able to find more info than that.  The Solr cloud diagram shows 
instance1 as active and leader, instance 2 as recovering.  My solrconfig.xml 
are identical, except for the LUCENE_42 or LUCENE_43 tag.


Any idea?  I hope that it is a configuration issue on my part...

Thank you for any help, Nic.

Re: Migrating from 4.2.1 to 4.3.0

2013-05-16 Thread Mark Miller

Your solr webapp context appears to be  rather than solr. There was a JIRA 
issue in 4.3 that may have affected this, but I only saw it from a distance, so 
just a guess.

What does it say in solr.xml for the context (an attribute on cores)

- Mark

On May 16, 2013, at 2:02 PM, M. Flatterie nicflatte...@yahoo.com wrote:

 Greetings, I just started with Solr a couple weeks ago, with version 4.2.1.
 
 I installed the following setup:
 - ZooKeeper: 3 instances ensemble
 - Solr: on Tomcat, 4 instances
 - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica
 
 - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica
 
 - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica
 
 
 With version 4.2.1 everything works fine.  But I do have a problem if I query 
 instance 3 for something in the WebOrder_Collection.  I found that this is a 
 bug in 4.2.1,. I must query instances 1 or 2 to get results from 
 WebOrder_Collection.
 
 
 Now that I have upgraded to 4.3.0 I have the following problem.  My replicas 
 will not recover.  The recovery will retry, and retry, ... forever.
 
 Details.  If I look at the Zoo, I see that:
  - node_name
 10.0.2.15:8180_solrin solr 4.2.1
 10.0.2.15:8180_ in solr 4.3.0
  - base_url
 http://10.0.2.15:8180/solr  in solr 4.2.1
 
 http://10.0.2.15:8180in solr 4.3.0
 
 My solr logs show this:
 
 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  – 
 Error while trying to recover. 
 core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
 
 
 I have not been able to find more info than that.  The Solr cloud diagram 
 shows instance1 as active and leader, instance 2 as recovering.  My 
 solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag.
 
 
 Any idea?  I hope that it is a configuration issue on my part...
 
 Thank you for any help, Nic.

having trouble storing large text blob fields - returns binary address in search results

2013-05-16 Thread geeky2

hello 

environment: solr 3.5

can someone help me with the correct configuration for some large text blob
fields?

we have two fields in informix tables that are of type text. 

when we do a search the results for these fields come back looking like
this: 

str name=attributes[B@17c232ee/str

i have tried setting them up as clob fields - but this is not working (see
details below)

i have also tried treating them as plain string fields (removing the
references to clob in the DIH) - but this does not work either.


DIH configuration:


  entity transformer=quot;TemplateTransformer,ClobTransformerquot;
name=quot;core1-partsquot; query=quot;select 
summ.*, 
1 as item_type, 
1 as part_cnt, 
'' as brand, 
...

 lt;field column=quot;attr_valquot;   name=quot;attributesquot;
clob=quot;truequot; /
field column=rsr_valname=restrictions clob=true
/


Schema.xml

  field name=attributes type=string indexed=false stored=true/
field name=restrictions type=string indexed=false stored=true/

thx
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Migrating from 4.2.1 to 4.3.0

Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 
4.3.0):

    Context path=/solr 
docBase=/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war debug=0 
crossContext=true
    Environment name=solr/home type=java.lang.String 
value=/home/solradm1 override=true/
    /Context




 From: Mark Miller markrmil...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Thursday, May 16, 2013 2:28:52 PM
Subject: Re: Migrating from 4.2.1 to 4.3.0
 

Your solr webapp context appears to be  rather than solr. There was a JIRA 
issue in 4.3 that may have affected this, but I only saw it from a distance, so 
just a guess.

What does it say in solr.xml for the context (an attribute on cores)

- Mark

On May 16, 2013, at 2:02 PM, M. Flatterie nicflatte...@yahoo.com wrote:

 Greetings, I just started with Solr a couple weeks ago, with version 4.2.1.
 
 I installed the following setup:
 - ZooKeeper: 3 instances ensemble
 - Solr: on Tomcat, 4 instances
     - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica
 
     - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica
 
     - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica
 
 
 With version 4.2.1 everything works fine.  But I do have a problem if I query 
 instance 3 for something in the WebOrder_Collection.  I found that this is a 
 bug in 4.2.1,. I must query instances 1 or 2 to get results from 
 WebOrder_Collection.
 
 
 Now that I have upgraded to 4.3.0 I have the following problem.  My replicas 
 will not recover.  The recovery will retry, and retry, ... forever.
 
 Details.  If I look at the Zoo, I see that:
      - node_name
             10.0.2.15:8180_solr        in solr 4.2.1
             10.0.2.15:8180_             in solr 4.3.0
      - base_url
            http://10.0.2.15:8180/solr      in solr 4.2.1
 
            http://10.0.2.15:8180            in solr 4.3.0
 
 My solr logs show this:
 
 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  – 
 Error while trying to recover. 
 core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
     at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)
     at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)
     at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
 
 
 I have not been able to find more info than that.  The Solr cloud diagram 
 shows instance1 as active and leader, instance 2 as recovering.  My 
 solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag.
 
 
 Any idea?  I hope that it is a configuration issue on my part...
 
 Thank you for any help, Nic.

Re: Facets referenced by key

2013-05-16 Thread Brendan Grainger

Thanks for the excellent clarification. I'll ask the sunspot guys about the 
localparams issue. I have a patch that would fix it

Thanks 
Brendan

On May 16, 2013, at 1:42 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : I would then like to refer to these 'pseudo' field later in the request
 : string. I thought this would be how I'd do it:
 : 
 : f.my_facet_key.facet.prefix=a_given_prefix
...
 
 
 that syntax was proposed in SOLR-1351 and a patch was made available, but 
 it was never commited (it only supported a subset of faceting, needed more 
 tests, and had unclear behavior about how the defaults where picked if 
 you combined f.key.facet.foo + f.field.facet.foo + facet.foo)
 
 : I thought this would work, however it doesn't appear to. What does work is
 : if I define the prefix and mincount in the local params:
 : 
 : facet.field={!ex=dt key=my_facet_key 
 facet.prefix=a_given_prefix}the_facet_field
 
 Correct, SOLR-4717 added support to Solr 4.3 for specifying all of the 
 facet options as local params such that that syntax would work.  Given th 
 way the use of Solr and localparams have evolved over the years it was 
 considered a more natural and logical way to specify facet option on a per 
 field or per key basis.
 
 : Is this expected? I'm also using sunspot and they construct the queries
 : with keys as in my first example, i.e. facet.field={!ex=dt
 : key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix
 
 I can't comment on that ... i'm not sure why sunspot would assume that 
 behavior would work (unless someone looked at SOLR-1351 once upon a time 
 and assumed that would definitely be official at some point)
 
 -Hoss

Re: Oracle Timestamp in SOLR


On 5/16/2013 11:00 AM, Chris Hostetter wrote:

There must be *some* way to either tweak your SQL or tweak your JDBC
connection properties such that Oracle's JDBC driver will give you a
legitimate java.sql.Date or java.sql.Timestamp instead of it's own
internal class (that doesn't extend java.util.Date) ... otherwise it's
just total freaking anarchy.


Looks like you can use the V8Compatible connection property or upgrade 
the oracle jdbc driver.  Upgrading the driver is probably the best option.


http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#08_01

Thanks,
Shawn

Aggregate word counts over a subset of documents

2013-05-16 Thread David Larochelle

Is there a way to get aggregate word counts over a subset of documents?

For example given the following data:

  {
id: 1,
category: cat1,
includes: The green car.,
  },
  {
id: 2,
category: cat1,
includes: The red car.,
  },
  {
id: 3,
category: cat2,
includes: The black car.,
  }

I'd like to be able to get total term frequency counts per category. e.g.

category name=cat1
   lst name=the2/lst
   lst name=car2/lst
   lst name=green1/lst
   lst name=red1/lst
/category
category name=cat2
   lst name=the1/lst
   lst name=car1/lst
   lst name=black1/lst
/category

I was initially hoping to do this within Solr and I tried using the
TermFrequencyComponent. This gives term frequencies for individual
documents and term frequencies for the entire index but doesn't seem to
help with subsets. For example, TermFrequencyComponent would tell me that
car occurs 3 times over all documents in the index and 1 time in document 1
but not that it occurs 2 times over cat1 documents and 1 time over cat2
documents.

Is there a good way to use Solr/Lucene to gather aggregate results like
this? I've been focusing on just using Solr with XML files but I could
certainly write Java code if necessary.

Thanks,

David

Re: Migrating from 4.2.1 to 4.3.0


On 5/16/2013 12:37 PM, M. Flatterie wrote:

Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 
4.3.0):

 Context path=/solr docBase=/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war 
debug=0 crossContext=true
 Environment name=solr/home type=java.lang.String value=/home/solradm1 
override=true/
 /Context


That is not the solr.xml Mark is referring to.  This solr.xml configures 
tomcat to load Solr.  You will have /home/solradm1/solr.xml as well, 
that is the one we are concerned with.


Thanks,
Shawn

Re: Migrating from 4.2.1 to 4.3.0

Oups sorry about that, since it was referring context I thought it was the 
Tomcat one.

Here is the /home/solradm1/solr.xml file (comments removed!)

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
    cores adminPath=/admin/cores defaultCoreName=WebOrder_Collection 
host=${host:} hostPort=8180 hostContext=${hostContext:} 
zkClientTimeout=${zkClientTimeout:15000}
    core name=WebOrder_Collection instanceDir=WebOrder_Collection
    property name=solr.data.dir 
value=/home/solradm1/WebOrder_Collection/data /
    property name=solr.ulog.dir 
value=/home/solradm1/WebOrder_Collection/ulog /
    /core
    /cores
/solr



Note: I configure solr.data.dir and solr.ulog.dir so I can run two instances on 
the same system and separate the data and ulog directories between the 
instances.

Nic.





 From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org 
Sent: Thursday, May 16, 2013 3:29:41 PM
Subject: Re: Migrating from 4.2.1 to 4.3.0
 

On 5/16/2013 12:37 PM, M. Flatterie wrote:
 Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 
 to 4.3.0):

          Context path=/solr 
docBase=/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war debug=0 
crossContext=true
              Environment name=solr/home type=java.lang.String 
value=/home/solradm1 override=true/
          /Context

That is not the solr.xml Mark is referring to.  This solr.xml configures 
tomcat to load Solr.  You will have /home/solradm1/solr.xml as well, 
that is the one we are concerned with.

Thanks,
Shawn

Re: Aggregate word counts over a subset of documents

2013-05-16 Thread Jason Hellman

David,

A Pivot Facet could possibly provide these results by the following syntax:

facet.pivot=category,includes

We would presume that includes is a tokenized field and thus a set of facet 
values would be rendered from the terms resoling from that tokenization.  This 
would be nested in each category…and, of course, the entire set of documents 
considered for these facets is constrained by the current query.

I think this maps to your requirement.

Jason

On May 16, 2013, at 12:29 PM, David Larochelle 
dlaroche...@cyber.law.harvard.edu wrote:

 Is there a way to get aggregate word counts over a subset of documents?
 
 For example given the following data:
 
  {
id: 1,
category: cat1,
includes: The green car.,
  },
  {
id: 2,
category: cat1,
includes: The red car.,
  },
  {
id: 3,
category: cat2,
includes: The black car.,
  }
 
 I'd like to be able to get total term frequency counts per category. e.g.
 
 category name=cat1
   lst name=the2/lst
   lst name=car2/lst
   lst name=green1/lst
   lst name=red1/lst
 /category
 category name=cat2
   lst name=the1/lst
   lst name=car1/lst
   lst name=black1/lst
 /category
 
 I was initially hoping to do this within Solr and I tried using the
 TermFrequencyComponent. This gives term frequencies for individual
 documents and term frequencies for the entire index but doesn't seem to
 help with subsets. For example, TermFrequencyComponent would tell me that
 car occurs 3 times over all documents in the index and 1 time in document 1
 but not that it occurs 2 times over cat1 documents and 1 time over cat2
 documents.
 
 Is there a good way to use Solr/Lucene to gather aggregate results like
 this? I've been focusing on just using Solr with XML files but I could
 certainly write Java code if necessary.
 
 Thanks,
 
 David

wiki versus downloads versus archives

2013-05-16 Thread Benson Margulies

http://wiki.apache.org/solr/Solr3.1 claims that Solr3.1 is available in a
place where it is not, and I can't find a link on the front page to the
archive for old releases.

Re: Migrating from 4.2.1 to 4.3.0


On 5/16/2013 1:40 PM, M. Flatterie wrote:

Oups sorry about that, since it was referring context I thought it was the 
Tomcat one.

Here is the /home/solradm1/solr.xml file (comments removed!)

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
 cores adminPath=/admin/cores defaultCoreName=WebOrder_Collection host=${host:} 
hostPort=8180 hostContext=${hostContext:} zkClientTimeout=${zkClientTimeout:15000}
 core name=WebOrder_Collection instanceDir=WebOrder_Collection
 property name=solr.data.dir 
value=/home/solradm1/WebOrder_Collection/data /
 property name=solr.ulog.dir 
value=/home/solradm1/WebOrder_Collection/ulog /
 /core
 /cores
/solr


The hostContext attribute needs changing.  It should be this instead:

hostContext=${hostContext:/solr}

Looks like the previous version wasn't taking this attribute from your 
config, but the new version is.  This is probably a bug that was fixed 
in 4.3.


Thanks,
Shawn

Re: wiki versus downloads versus archives


On 5/16/2013 2:21 PM, Benson Margulies wrote:

http://wiki.apache.org/solr/Solr3.1 claims that Solr3.1 is available in a
place where it is not, and I can't find a link on the front page to the
archive for old releases.


Download links fixed on the wiki pages for 3.1 and 3.2.

Thanks,
Shawn

Re: wiki versus downloads versus archives

2013-05-16 Thread Benson Margulies

tanks.


On Thu, May 16, 2013 at 4:28 PM, Shawn Heisey s...@elyograg.org wrote:

 On 5/16/2013 2:21 PM, Benson Margulies wrote:

 http://wiki.apache.org/solr/**Solr3.1http://wiki.apache.org/solr/Solr3.1claims
  that Solr3.1 is available in a
 place where it is not, and I can't find a link on the front page to the
 archive for old releases.


 Download links fixed on the wiki pages for 3.1 and 3.2.

 Thanks,
 Shawn

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

Hi Shawn;

You have some tips about JVM parameters starting a Solr node. What do you
have special for Solr when you start a Zookeeper ensemble. i.e. heap size?

2013/5/16 Shawn Heisey s...@elyograg.org

 On 5/16/2013 9:25 AM, Furkan KAMACI wrote:

 I know that there have been many conversations about SolrCloud startup
 tips
 i.e. which type of garbage collector to use etc. Also I know that  there
 is
 no an exact answer for this question. However I think that folks have some
 tips about this question.

 How do you start up your external Zookeeper, with which parameters and any
 tips for it?


 An external zookeeper is just that - external, not part of Solr.  I
 followed the zookeeper docs, and used the normal zookeeper port, 2181:

 http://zookeeper.apache.org/doc/r3.4.5/

 Thanks,
 Shawn

SurroundQParser does not analyze the query text

2013-05-16 Thread Isaac Hebsh

Hi,

I'm trying to use Surround Query Parser for two reasons, which are not
covered by proximity slops:
1. find documents with two words within a given distance, *unordered*
2. given two lists of words, find documents with (at least) one word from
list A and (at least) one word from list B, within a given distance.

The surround query parser looks great, but it have one big drawback - It
does not analyze the query text. It is documented in the [weak :(] wiki
page.

Can this issue be solved somehow, or it is a bigger constraint?
Should I open a JIRA issue for this?
Any work-around?

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?


On 5/16/2013 2:34 PM, Furkan KAMACI wrote:

You have some tips about JVM parameters starting a Solr node. What do you
have special for Solr when you start a Zookeeper ensemble. i.e. heap size?


I haven't given it any JVM options.  The ZK process on my primary server 
has a 5GB virtual memory size and is using 131MB of system memory.  If 
you're not going to be creating a large number of collection or replicas 
and you're not using super-large config files, you could probably limit 
the max heap to a pretty small number and be OK.


Thanks,
Shawn

Re: Migrating from 4.2.1 to 4.3.0

Great it works, I am back on track!  Thank you!!!
Nic

 From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org 
Sent: Thursday, May 16, 2013 4:25:09 PM
Subject: Re: Migrating from 4.2.1 to 4.3.0

On 5/16/2013 1:40 PM, M. Flatterie wrote:
 Oups sorry about that, since it was referring context I thought it was the 
 Tomcat one.

 Here is the /home/solradm1/solr.xml file (comments removed!)

 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true
      cores adminPath=/admin/cores defaultCoreName=WebOrder_Collection 
host=${host:} hostPort=8180 hostContext=${hostContext:} 
zkClientTimeout=${zkClientTimeout:15000}
          core name=WebOrder_Collection instanceDir=WebOrder_Collection
              property name=solr.data.dir 
value=/home/solradm1/WebOrder_Collection/data /
              property name=solr.ulog.dir 
value=/home/solradm1/WebOrder_Collection/ulog /
          /core
      /cores
 /solr

The hostContext attribute needs changing.  It should be this instead:

hostContext=${hostContext:/solr}

Looks like the previous version wasn't taking this attribute from your 
config, but the new version is.  This is probably a bug that was fixed 
in 4.3.

Thanks,
Shawn

SOLR Junit test - How to resolve error - 'thread leaked from SUITE scope'?

2013-05-16 Thread bbarani


I am using SOLR 4.3.0...I am currently getting the below error when running
test for custom SOLR components. The tests pass without any issues but I am
getting the below error after the tests are done.. Can someone let me how to
resolve this issue? 

thread leaked from SUITE scope at com.solr.activemq.TestWriter: 
[junit]1) Thread[id=19, name=ActiveMQ Scheduler, state=WAITING,
group=TGRP-TestWriter]
[junit] at java.lang.Object.wait(Native Method)
[junit] at java.lang.Object.wait(Object.java:503)
[junit] at java.util.TimerThread.mainLoop(Timer.java:526)
[junit] at java.util.TimerThread.run(Timer.java:505)
[junit] com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread
leaked from SUITE scope at com.solr.activemq.TestWriter: 
[junit]1) Thread[id=19, name=ActiveMQ Scheduler, state=WAITING,
group=TGRP-TestWriter]
[junit] at java.lang.Object.wait(Native Method)
[junit] at java.lang.Object.wait(Object.java:503)
[junit] at java.util.TimerThread.mainLoop(Timer.java:526)
[junit] at java.util.TimerThread.run(Timer.java:505)
[junit] at __randomizedtesting.SeedInfo.seed([64E0A7A0D98E09EE]:0)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Junit-test-How-to-resolve-error-thread-leaked-from-SUITE-scope-tp4064026.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Zookeeper Ensemble Startup Parameters For SolrCloud?

Hi Shawn;

I will have totally 18 Solr nodes at my current pre-prototype environment
over one collection and I don't have large config files. I know that best
and only recommend practice for estimating the heap size of my system needs
is to run load tests and I will.

I asked this question because of an example at Zookeeper wiki:

You should take special care to set your Java max heap size correctly. In
particular, you should not create a situation in which ZooKeeper swaps to
disk. The disk is death to ZooKeeper. Everything is ordered, so if
processing one request swaps the disk, all other queued requests will
probably do the same. the disk. DON'T SWAP.

Be conservative in your estimates: if you have 4G of RAM, do not set the
Java max heap size to 6G or even 4G. For example, it is more likely you
would use a 3G heap for a 4G machine, as the operating system and the cache
also need memory.

This may be a more Zookeeper related question but one more question too. Is
there anything something like not to use Zookeeper on a virtual machine
because of performance issues or not?




2013/5/16 Shawn Heisey s...@elyograg.org

 On 5/16/2013 2:34 PM, Furkan KAMACI wrote:

 You have some tips about JVM parameters starting a Solr node. What do you
 have special for Solr when you start a Zookeeper ensemble. i.e. heap size?


 I haven't given it any JVM options.  The ZK process on my primary server
 has a 5GB virtual memory size and is using 131MB of system memory.  If
 you're not going to be creating a large number of collection or replicas
 and you're not using super-large config files, you could probably limit the
 max heap to a pretty small number and be OK.

 Thanks,
 Shawn

Re: SOLR Junit test - How to resolve error - 'thread leaked from SUITE scope'?


On 5/16/2013 3:05 PM, bbarani wrote:


I am using SOLR 4.3.0...I am currently getting the below error when running
test for custom SOLR components. The tests pass without any issues but I am
getting the below error after the tests are done.. Can someone let me how to
resolve this issue?

thread leaked from SUITE scope at com.solr.activemq.TestWriter:
 [junit]1) Thread[id=19, name=ActiveMQ Scheduler, state=WAITING,
group=TGRP-TestWriter]


It looks like your code incorporates ActiveMQ.  That software apparently 
starts a scheduler thread, and you aren't shutting that down.  I'm 
guessing that part of ActiveMQ initialization is creating some kind of 
scheduler object, and that you will need to call a close() or shutdown() 
method on that object as you wrap things up.


If that doesn't help, you'll need to consult support resources for ActiveMQ.

Thanks,
Shawn

Re: Speed up import of Hierarchical Data

2013-05-16 Thread O. Olson

Thank you Stefan. I am new to Solr and I would need to read up more on
CachedSqlEntityProcessor. Do you have any clue where to begin? There do not
seem to be any tutorials online.

The link you provided seems to have a very short and unclear explanation.
After “Example 1” you have “The usage is exactly same as the other one.”
What does “other one” refer to? I did not understand the description
completely.

This description seems to say that if the query is the same as a prior query
it would fetched from the cache. From my case each of the Category queries
are unique because they have a unique SKU and Category Level. Would
CachedSqlEntityProcessor then help me?

Thank you,
O. O.



Stefan Matheis-2 wrote
 That sounds like a perfect match for
 http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor :)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924p4064034.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Aggregate word counts over a subset of documents

2013-05-16 Thread David Larochelle

Jason,

Thanks so much for your suggestion. This seems to do what I need.

--

David

On Thu, May 16, 2013 at 3:59 PM, Jason Hellman 
jhell...@innoventsolutions.com wrote:

 David,

 A Pivot Facet could possibly provide these results by the following syntax:

 facet.pivot=category,includes

 We would presume that includes is a tokenized field and thus a set of
 facet values would be rendered from the terms resoling from that
 tokenization.  This would be nested in each category…and, of course, the
 entire set of documents considered for these facets is constrained by the
 current query.

 I think this maps to your requirement.

 Jason

 On May 16, 2013, at 12:29 PM, David Larochelle 
 dlaroche...@cyber.law.harvard.edu wrote:

  Is there a way to get aggregate word counts over a subset of documents?
 
  For example given the following data:
 
   {
 id: 1,
 category: cat1,
 includes: The green car.,
   },
   {
 id: 2,
 category: cat1,
 includes: The red car.,
   },
   {
 id: 3,
 category: cat2,
 includes: The black car.,
   }
 
  I'd like to be able to get total term frequency counts per category. e.g.
 
  category name=cat1
lst name=the2/lst
lst name=car2/lst
lst name=green1/lst
lst name=red1/lst
  /category
  category name=cat2
lst name=the1/lst
lst name=car1/lst
lst name=black1/lst
  /category
 
  I was initially hoping to do this within Solr and I tried using the
  TermFrequencyComponent. This gives term frequencies for individual
  documents and term frequencies for the entire index but doesn't seem to
  help with subsets. For example, TermFrequencyComponent would tell me that
  car occurs 3 times over all documents in the index and 1 time in
 document 1
  but not that it occurs 2 times over cat1 documents and 1 time over cat2
  documents.
 
  Is there a good way to use Solr/Lucene to gather aggregate results like
  this? I've been focusing on just using Solr with XML files but I could
  certainly write Java code if necessary.
 
  Thanks,
 
  David

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Hi Jack,

Thanks for your response again and for helping me out to get through this.

The URL is definitely encoded for spaces and it looks like below. As I
mentioned in my previous mail, I can't add it to query parameter as that
searches on multiple fields.

The title field is defined as below:
field name=title type=text_wc indexed=true stored=false
multiValued=true/

q=countrysiderows=20qt=assdismaxfq=%28title%3A%28,10%29%29fq=collection:assets

requestHandler name=assdismax class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qftitle^10 description^5 annotations^3 notes^2 categories/str
str name=pftitle/str
int name=ps0/int
str name=q.alt*:*/str
str name=fl*,score/str
str name=mm100%/str
str name=q.opAND/str
str name=sortscore desc/str
str name=facettrue/str
str name=facet.limit-1/str
str name=facet.mincount1/str
str name=facet.fielduniq_subtype_id/str
str name=facet.fieldcomponent_type/str
str name=facet.fieldgenre_type/str
/lst
lst name=appends
str name=fqcollection:assets/str
/lst
/requestHandler

The term 'countryside' needs to be searched against multiple fields
including titles, descriptions, annotations, categories, notes but the UI
also has a feature to limit results by providing a title field.


I can see that the filter queries are always parsed by LuceneQueryParser
however I'd expect it to generate the parsed_filter_queries debug output in
every situation.

I have tried it as the main query with both edismax and lucene defType and
it gives me correct output and correct results.
But, there is some problem when this is used as a filter query as the the
parser is not able to parse a comma with a space.

Thanks again Jack, please let me know in case you need more inputs from my
side.

Best Regards,
Sandeep

On 16 May 2013 18:03, Jack Krupansky j...@basetechnology.com wrote:

 Could you show us the full query URL - spaces must be encoded in URL query
 parameters.

 Also show the actual field XML - you omitted that.

 Try the same query as a main query, using both defType=edismax and
 defType=lucene.

 Note that the filter query is parsed using the Lucene query parser, not
 edismax, independent of the defType parameter. But you don't have any
 edismax features in your fq anyway.

 But you can stick {!edismax} in front of the query to force edismax to be
 used for the fq, although it really shouldn't change anything:

 Also, catenate is fine for indexing, but will mess up your queries at
 query time, so set them to 0 in the query analyzer

 Also, make sure you have autoGeneratePhraseQueries=**true on the field
 type, but that's not the issue here.


 -- Jack Krupansky

 -Original Message- From: Sandeep Mestry
 Sent: Thursday, May 16, 2013 12:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about Edismax - Solr 4.0


 Thanks Jack for your reply..

 The problem is, I'm finding results for fq=title:(,10) but not for
 fq=title:(, 10) - apologies if that was not clear from my first mail.
 I have already mentioned the debug analysis in my previous mail.

 Additionally, the title field is defined as below:
 fieldType name=text_wc class=solr.TextField positionIncrementGap=100


  analyzer type=index
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.**WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.**LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.**WordDelimiterFilterFactory
 stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
 splitOnNumerics=0 preserveOriginal=1 /
filter class=solr.**LowerCaseFilterFactory/
/analyzer
/fieldType

 I have the set catenate options to 1 for all types.
 I can understand if ',' getting ignored when it is on its own (title:(,
 10)) but
 - Why solr is not searching for 10 in that case just like it did when the
 query was (title:(,10))?
 - And why other filter queries did not show up (collection:assets) in debug
 section?


 Thanks,
 Sandeep


 On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote:

  You haven't indicated any problem here! What is the symptom that you
 actually think is a problem.

 There is no comma operator in any of the Solr query parsers. Comma is just
 another character that may or may not be included or discarded depending
 on
 the specific field type and analyzer. For example, a white space analyzer
 will keep commas, but the standard analyzer or the word delimiter filter
 will discard them. If title were a string type, all punctuation would
 be

RE: Speed up import of Hierarchical Data

2013-05-16 Thread O. Olson

Thank you James. Are there any examples of SortedMapBackedCache? I am new to
Solr and I do not find many tutorials in this regard. I just modified the
examples and they worked for me. What is a good way to learn these basics?
O. O.

Dyer, James-2 wrote
See https://issues.apache.org/jira/browse/SOLR-2943 . You can set up 2
DIH handlers. The first would query the CAT_TABLE and save it to a
disk-backed cache, using DIHCacheWriter. You then would replace your 3
child entities in the 2nd DIH handler to use DIHCacheProcessor to read
back the cached data. This is a little complicated to do, but it would
let you just cache the data once and because it is disk-backed, will scale
to whatever size the CAT_TABLE is. (For some details, see this thread:
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tt4015514.html)

A simpler method is simply to specify cacheImpl=SortedMapBackedCache on
the 3 child entities. (This is the same as using
CachedSqlEntityProcessor.) It would generate 3 in-memory caches, each
with the same data. If CAT_TABLE is small, this would be adequate.

In between this would be to create a disk-backed cache Impl (or use the
ones at SOLR-2613 or SOLR-2948) and specify it on cacheImpl. It would
still create 3 identical caches, but they would be disk-backed and could
scale beyond what in-memory can handle.

James Dyer
Ingram Content Group
(615) 213-4311

--
View this message in context:
http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924p4064040.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Deleting an entry from a collection when they key has : in it

2013-05-16 Thread Chris Hostetter


: If in my schema, I have the key field set to indexed=false, then is that
: maybe the issue?  I'm going to try to set that to true and rebuild the
: repository and see if that does it.

if a field is indexed=false you can not query on it.

if you can not query on a field, then you can not delete by a query 
against that field.


-Hoss

Re: SurroundQParser does not analyze the query text

2013-05-16 Thread Erik Hatcher

The issue can certainly be solved.  But to me, it's actually a bit of a 
feature by design for the Lucene-level surround query parser to not do 
analysis, as it seems to have been meant for advanced query writers to piece 
together sophisticated SpanQuery-based pattern matching kinds of things 
utilizing their knowledge of how text was analyzed and indexed.

But for sure it could be modified to do analysis, probably using the 
multiterm analyzer feature in there now elsewhere now.  I looked into this 
when I did the basic work of integrating the surround query parser, and 
determined it was a lot of work because it'd need changes in the Lucene level 
code to leverage analysis, and then glue at the Solr level to be field type 
aware and savvy.

By all means open and JIRA and contribute!

Workaround?  Client-side calls can be made to analyze text, and the client-side 
could build up a query expression based on term-by-term (or phrase) analysis 
results.  Maybe that means a prohibitive number of requests to Solr to build up 
a query in a way that leverages Solr's field type analysis settings, but it is 
a technologically possible technique maybe worth considering.

Erik



On May 16, 2013, at 16:38 , Isaac Hebsh wrote:

 Hi,
 
 I'm trying to use Surround Query Parser for two reasons, which are not
 covered by proximity slops:
 1. find documents with two words within a given distance, *unordered*
 2. given two lists of words, find documents with (at least) one word from
 list A and (at least) one word from list B, within a given distance.
 
 The surround query parser looks great, but it have one big drawback - It
 does not analyze the query text. It is documented in the [weak :(] wiki
 page.
 
 Can this issue be solved somehow, or it is a bigger constraint?
 Should I open a JIRA issue for this?
 Any work-around?

Re: SOLR test framework- ERROR: SolrIndexSearcher opens=1 closes=0

2013-05-16 Thread bbarani

Thanks a lot for your response.

I figured out that I am not closing the LocalSolrQueryRequest after handling
the response..The error got resolved after closing the request object.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-test-framework-ERROR-SolrIndexSearcher-opens-1-closes-0-tp4063940p4064044.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr httpCaching for distinct handlers

2013-05-16 Thread mat0112

Hi everybody, I would like to have distinct httpCaching configuration for
distinct handlers, i.e if a request comes for select, send a cache control
header of 1 minute ; and if receive a request for mlt then send a cache
control header of 5 minutes.
Is there a way to do that in my solrconfig.xml ?
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-httpCaching-for-distinct-handlers-tp4064050.html
Sent from the Solr - User mailing list archive at Nabble.com.

Null identity service When Running Solr 4.2.1 with log4j

I have Solr 4.2.1 and want to use log4j. I have followed wiki. Here are my
jar versions:

java -jar start.jar --version
Active Options: [default, *]
Version Information on 15 entries in the classpath.
Note: order presented here is how they would appear on the classpath.
changes to the OPTIONS=[option,option,...] command line option will be
reflected here.
0: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-xml-8.1.8.v20121106.jar
1: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
2: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-http-8.1.8.v20121106.jar
3: 8.1.8.v20121106 |
${jetty.home}/lib/jetty-continuation-8.1.8.v20121106.jar
4: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-server-8.1.8.v20121106.jar
5: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-security-8.1.8.v20121106.jar
6: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-servlet-8.1.8.v20121106.jar
7: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-webapp-8.1.8.v20121106.jar
8: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-deploy-8.1.8.v20121106.jar
9: 1.7.5 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.7.5.jar
10: 1.2.17 | ${jetty.home}/lib/ext/log4j-1.2.17.jar
11: 1.7.5 | ${jetty.home}/lib/ext/slf4j-api-1.7.5.jar
12: 1.7.5 | ${jetty.home}/lib/ext/slf4j-log4j12-1.7.5.jar
13: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-util-8.1.8.v20121106.jar
14: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-io-8.1.8.v20121106.jar

I created a log4j.properties under etc folder and this is inside of it:

# Logging level

log4j.rootLogger=WARN, file


#- size rotation with log cleanup.

log4j.appender.file=org.apache.log4j.RollingFileAppender

log4j.appender.file.MaxFileSize=4MB

log4j.appender.file.MaxBackupIndex=9


#- File to log to and log format

log4j.appender.file.File=logs/solr.log

log4j.appender.file.layout=org.apache.log4j.PatternLayout

log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd
HH:mm:ss.SSS}; %C; %m\n

When I run start.jar I get that:


java -Dlog4j.debug
-Dlog4j.configuration=file:home/kk/Desktop/preprop/etc/log4j.properties
-jar start.jar

log4j: Using URL [file:home/kk/Desktop/preprop/etc/log4j.properties]
for automatic log4j configuration.
log4j: Reading configuration from URL
file:home/kk/Desktop/preprop/etc/log4j.properties
log4j: Parsing for [root] with value=[WARN, file].
log4j: Level token is [WARN].
log4j: Category root set to WARN
log4j: Parsing appender named file.
log4j: Parsing layout options for file.
log4j: Setting property [conversionPattern] to [%-5p - %d{-MM-dd
HH:mm:ss.SSS}; %C; %m
].
log4j: End of parsing for file.
log4j: Setting property [maxBackupIndex] to [9].
log4j: Setting property [file] to [logs/solr.log].
log4j: Setting property [maxFileSize] to [4MB].
log4j: setFile called: logs/solr.log, true
log4j: setFile ended
log4j: Parsed file options.
log4j: Finished configuring.
*Null identity service, trying login service: null
Finding identity service: null*

What I am missing?

Re: Null identity service When Running Solr 4.2.1 with log4j

When I check under logs folder I see that there is a file called solr.log
and has that line:

WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; Log
watching is not yet implemented for log4j


2013/5/17 Furkan KAMACI furkankam...@gmail.com

 I have Solr 4.2.1 and want to use log4j. I have followed wiki. Here are my
 jar versions:

 java -jar start.jar --version
 Active Options: [default, *]
 Version Information on 15 entries in the classpath.
 Note: order presented here is how they would appear on the classpath.
 changes to the OPTIONS=[option,option,...] command line option will be
 reflected here.
 0: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-xml-8.1.8.v20121106.jar
 1: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
 2: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-http-8.1.8.v20121106.jar
 3: 8.1.8.v20121106 |
 ${jetty.home}/lib/jetty-continuation-8.1.8.v20121106.jar
 4: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-server-8.1.8.v20121106.jar
 5: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-security-8.1.8.v20121106.jar
 6: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-servlet-8.1.8.v20121106.jar
 7: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-webapp-8.1.8.v20121106.jar
 8: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-deploy-8.1.8.v20121106.jar
 9: 1.7.5 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.7.5.jar
 10: 1.2.17 | ${jetty.home}/lib/ext/log4j-1.2.17.jar
 11: 1.7.5 | ${jetty.home}/lib/ext/slf4j-api-1.7.5.jar
 12: 1.7.5 | ${jetty.home}/lib/ext/slf4j-log4j12-1.7.5.jar
 13: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-util-8.1.8.v20121106.jar
 14: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-io-8.1.8.v20121106.jar

 I created a log4j.properties under etc folder and this is inside of it:

 # Logging level

 log4j.rootLogger=WARN, file


 #- size rotation with log cleanup.

 log4j.appender.file=org.apache.log4j.RollingFileAppender

 log4j.appender.file.MaxFileSize=4MB

 log4j.appender.file.MaxBackupIndex=9


 #- File to log to and log format

 log4j.appender.file.File=logs/solr.log

 log4j.appender.file.layout=org.apache.log4j.PatternLayout

 log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd
 HH:mm:ss.SSS}; %C; %m\n

 When I run start.jar I get that:


 java -Dlog4j.debug
 -Dlog4j.configuration=file:home/kk/Desktop/preprop/etc/log4j.properties
 -jar start.jar

 log4j: Using URL [file:home/kk/Desktop/preprop/etc/log4j.properties]
 for automatic log4j configuration.
 log4j: Reading configuration from URL
 file:home/kk/Desktop/preprop/etc/log4j.properties
 log4j: Parsing for [root] with value=[WARN, file].
 log4j: Level token is [WARN].
 log4j: Category root set to WARN
 log4j: Parsing appender named file.
 log4j: Parsing layout options for file.
 log4j: Setting property [conversionPattern] to [%-5p - %d{-MM-dd
 HH:mm:ss.SSS}; %C; %m
 ].
 log4j: End of parsing for file.
 log4j: Setting property [maxBackupIndex] to [9].
 log4j: Setting property [file] to [logs/solr.log].
 log4j: Setting property [maxFileSize] to [4MB].
 log4j: setFile called: logs/solr.log, true
 log4j: setFile ended
 log4j: Parsed file options.
 log4j: Finished configuring.
 *Null identity service, trying login service: null
 Finding identity service: null*

 What I am missing?

Controlling which node(s) hold(s) a collection

2013-05-16 Thread Otis Gospodnetic

Hi,

Is it possible to control on which node(s) a collection should be placed?

I've looked at http://wiki.apache.org/solr/SolrCloud and
http://wiki.apache.org/solr/CoreAdmin and have searched the ML
archives, but couldn't find any mentions of that.

Use case:
* Want to use SolrCloud for large indices that I want to shard and replicate
* Have a number of smaller indices that need to live in the same
cluster, but that I don't want to shard - queries are fast when
executed against the whole index being on a single server, and they
use join and pivot faceting, neither of which works with sharded
indices

I have 30+ such non-shardable indices of varying sizes and I want to
make sure they are distributed over all cluster nodes nice and evenly.
 I'm assuming there is no better way than to manually control
placement of my 1-shard collections (i that's even doable), but if
there is a better way, I'm all eyeballs!

Thanks,
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

Re: Null identity service When Running Solr 4.2.1 with log4j


On 5/16/2013 5:22 PM, Furkan KAMACI wrote:

*Null identity service, trying login service: null
Finding identity service: null*

What I am missing?


That's a message from jetty that has nothing to do with Solr.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=396295

You'll probably need to upgrade your jetty version to get rid of it, but 
it's harmless.


 When I check under logs folder I see that there is a file called
 solr.log and has that line:
 WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; 
Log watching is not yet implemented for log4j


This is normal for 4.2.1 - it means that you can't view the log in the 
admin UI, because the UI doesn't support log4j.


You'll find that with your logging level set to WARN, Solr logs next to 
nothing - that message about the log watching may be the only message 
you see.


Thanks,
Shawn

Re: Null identity service When Running Solr 4.2.1 with log4j

Thanks Shawn. I have just wondered that how other people could used log4j
with 4.2.1 because of there is a paragraph for Using log4j with Solr from
source, 4.2.1 or earlier at wiki.

2013/5/17 Shawn Heisey s...@elyograg.org

 On 5/16/2013 5:22 PM, Furkan KAMACI wrote:

 *Null identity service, trying login service: null
 Finding identity service: null*

 What I am missing?


 That's a message from jetty that has nothing to do with Solr.

 https://bugs.eclipse.org/bugs/show_bug.cgi?id=396295

 You'll probably need to upgrade your jetty version to get rid of it, but
 it's harmless.


  When I check under logs folder I see that there is a file called
  solr.log and has that line:
  WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; Log
 watching is not yet implemented for log4j

 This is normal for 4.2.1 - it means that you can't view the log in the
 admin UI, because the UI doesn't support log4j.

 You'll find that with your logging level set to WARN, Solr logs next to
 nothing - that message about the log watching may be the only message you
 see.

 Thanks,
 Shawn

Re: Controlling which node(s) hold(s) a collection

2013-05-16 Thread Mark Miller

You can control simply with the CoreAdmin api - the core is created at the 
location of whatever url you use…simply fire the creates at whatever nodes you 
want the collection to live on.

The collections api also takes a list of nodes names to use optionally.

- Mark

On May 16, 2013, at 7:34 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 Hi,
 
 Is it possible to control on which node(s) a collection should be placed?
 
 I've looked at http://wiki.apache.org/solr/SolrCloud and
 http://wiki.apache.org/solr/CoreAdmin and have searched the ML
 archives, but couldn't find any mentions of that.
 
 Use case:
 * Want to use SolrCloud for large indices that I want to shard and replicate
 * Have a number of smaller indices that need to live in the same
 cluster, but that I don't want to shard - queries are fast when
 executed against the whole index being on a single server, and they
 use join and pivot faceting, neither of which works with sharded
 indices
 
 I have 30+ such non-shardable indices of varying sizes and I want to
 make sure they are distributed over all cluster nodes nice and evenly.
 I'm assuming there is no better way than to manually control
 placement of my 1-shard collections (i that's even doable), but if
 there is a better way, I'm all eyeballs!
 
 Thanks,
 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html

Re: Null identity service When Running Solr 4.2.1 with log4j