Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine

Hi,

I am trying Solr and i have one question. In the schema that i set up, 
there are 10 fields with always same data(hierarchical taxonomies) but 
with 4 million
documents, space disk and indexing time must be big. I need this field 
for auto complete. Is there another way to do this type of operation ?


Damien


Re: Taxonomy in SOLR

2011-01-24 Thread Em

Hi Damien,

can you provide a schema sample plus example-data?
Since your information is really general, I think no one can give you a
situation-specific advice.

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting started with writing parser

2011-01-24 Thread Dinesh

my solrconfig.xml

http://pastebin.com/XDg0L4di

my schema.xml

http://pastebin.com/3Vqvr3C0

my try.xml

http://pastebin.com/YWsB37ZW

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2318218.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH serialize

2011-01-24 Thread Papp Richard
Hi Dennis,

  thank you for your answer, but didn't understand why you say it doesn't need 
serialization. I'm with the option C.
  but the main question is, how to put into one field a result of many fields: 
SELECT * FROM.

thanks,
  Rich

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Monday, January 24, 2011 02:07
To: solr-user@lucene.apache.org
Subject: Re: DIH serialize

Depends on your process chain to the eventual viewer/consumer of the data.

The questions to ask are:
  A/ Is the data IN Solr going to be viewed or processed in its original form:
  --set stored = 'true'
 ---no serialization needed.
  B/ If it's going to be anayzed and searched for separate from any other 
field, 

  the analyzing will put it into  an unreadable form. If you need to see 
it, 
then
 ---set indexed=true and stored=true
 ---no serializaton needed.   C/ If it's NOT going to be viewed AS IS, and 
it's not going to be searched for AS IS,
   (i.e. other columns will be how the data is found), and you have 
another, 

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  C/ If it's NOT going to be viewed AS IS, and it's not going to be searched 
for 
AS IS,
   (i.e. other columns will be how the data is found), and you have 
another, 

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for AS 
IS,
   (this column will be how the data is found), and you have another, 
   serialzable format:
   --you need to put it into TWO columns
   --A SERIALIZED FIELD
   --set indexed=false and stored=true

  --AN UNSERIALIZED FIELD
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.

Hope that helps!


Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Papp Richard ccode...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sun, January 23, 2011 2:02:05 PM
Subject: DIH serialize

Hi all,



  I wasted the last few hours trying to serialize some column values (from
mysql) into a Solr column, but I just can't find such a function. I'll use
the value in PHP - I don't know if it is possible to serialize in PHP style
at all. This is what I tried and works with a given factor:



in schema.xml:

   field name=main_timetable  type=text indexed=false
stored=true multiValued=true /



in DIH xml:



dataConfig

  script![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]/script



.



  entity name=main_timetable query=

SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id}';

transformer=script:my_serialize



.

 



  Can I use java directly in script (script language=Java) ?

  How could I achieve this? Or any other idea? 

  I need these values together (from a row) and I need then in PHP to handle
the result easily.



thanks,

  Rich
 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5740 (20101228) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5740 (20101228) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine

My schema :

field name=id type=string indexed=true stored=true required=true /

!-- Document --
field name=lead type=string indexed=true stored=true /
field name=title type=string indexed=true stored=true required=true /
field name=text type=string indexed=true stored=true required=true /

!-- taxo --
dynamicField name=*_taxon_label type=string indexed=true stored=true /
dynamicField name=*_taxon_type type=string indexed=true stored=true /
dynamicField name=*_taxon_hierarchy type=string indexed=true stored=true 
multiValued=true /

field name=type type=string indexed=true stored=true required=true /


Le 24/01/2011 09:56, Em a écrit :

Hi Damien,

can you provide a schema sample plus example-data?
Since your information is really general, I think no one can give you a
situation-specific advice.

Regards




Re: Taxonomy in SOLR

2011-01-24 Thread Em

Hi Damien,

why are you storing the taxonomies?
When it comes to faceting, it only depends on indexed values. If there is a
meaningful difference between the indexed and the stored value, I would
prefer to use an RDBMs or something like that to reduce redundancy.

Does this help?

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318363.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing spatial columns

2011-01-24 Thread mapbutcher

Hi,

I'm a bit of a solr beginner. I have installed Solr 4.0 and I'm trying to
index some spatial data stored in a sql server instance. I'm using the
DataImportHandler here is my data-comfig.xml:

dataConfig
dataSource type=JdbcDataSource
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost\sqlserver08;databaseName=Spatial user=sa
password=sqlserver08/
  document
entity name=poi query=select OBJECTID,CATEGORY,NAME,POINT_X,POINT_Y
from NZ_POI
field column=OBJECTID name=id/
field column=CATEGORY name=category/
field column=NAME name=name/
field column=POINT_X name=lat/
field column=POINT_Y name=lon/
/entity
  /document
/dataConfig

In my schema file I have following definition:

   field name=category type=string indexed=true stored=true/
   field name=name type=string indexed=true stored=true/
   field name=lat type=tdouble indexed=true stored=true/   
   field name=lon type=tdouble indexed=true stored=true/

   copyField source=category dest=text/
   copyField source=name dest=text/

I have completed a data import with no errors in the log as far as i can
tell. However when i inspect the schema i do not see the columns names
lat\lon. When sending the query:

http://localhost:8080/Solr/select/?q=Camp AND _val_:recip(dist(2, lon, lat,
44.794, -93.2696), 1, 1, 0)^100 

I get an error undefined column. 

Does anybody have any ideas about whether the above is the correct procedure
for indexing spatial data?

Cheers

S


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-spatial-columns-tp2318493p2318493.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine

Yes, i am not obliged to store taxonomies.

My taxonomies are type of

english_taxon_label = Berlin
english_taxon_type = location
english_taxon_hierarchy = 0/world
  1/world/europe
  2/world/europe/germany
  3/world/europe/germany/berlin

I need *_taxon_hierarchy to faceting and label to auto complete.

With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 
million documents the redundandcy is huge, no ?


And i have 10 different taxonomies per document 

Damien

Le 24/01/2011 10:30, Em a écrit :

Hi Damien,

why are you storing the taxonomies?
When it comes to faceting, it only depends on indexed values. If there is a
meaningful difference between the indexed and the stored value, I would
prefer to use an RDBMs or something like that to reduce redundancy.

Does this help?

Regards




Re: Taxonomy in SOLR

2011-01-24 Thread Em

100 Entries per taxon?
Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons.
If your indexed taxon-versions are looking okay, you could leave out the
DB-overhead and could do everything in Solr.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta Import occasionally missing records.

2011-01-24 Thread btucker

Thank you for your response.

In what way is 'timestamp' not perfect?

I've looked into the SolrEntityProcessor and added a timestamp field to our
index.
However i'm struggling to work out a query to get the max value od the
timestamp field
and does the SolrEntityProcessor entity appear before the root entity or
does it wrap around the root entity.

On 22 January 2011 07:24, Lance Norskog-2 [via Lucene] 
ml-node+2307215-627680969-326...@n3.nabble.comml-node%2b2307215-627680969-326...@n3.nabble.com
 wrote:

 The timestamp thing is not perfect. You can instead do a search
 against Solr and find the latest timestamp in the index. SOLR-1499
 allows you to search against Solr in the DataImportHandler.

 On Fri, Jan 21, 2011 at 2:27 AM, btucker [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2307215i=0
 wrote:

 
  Hello
 
  We've just started using solr to provide search functionality for our
  application with the DataImportHandler performing a delta-import every 1
  fired by crontab, which works great, however it does occasionally miss
  records that are added to the database while the delta-import is running.

 
  Our data-config.xml has the following queries in its root entity:
 
  query=SELECT id, date_published, date_created, publish_flag FROM Item
 WHERE
  id  0
 
  AND record_type_id=0
 
  ORDER BY id DESC
  preImportDeleteQuery=SELECT item_id AS Id FROM
  gnpd_production.item_deletions
  deletedPkQuery=SELECT item_id AS id FROM gnpd_production.item_deletions
  WHERE deletion_date =
 
  SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE)
  deltaImportQuery=SELECT id, date_published, date_created, publish_flag
 FROM
  Item WHERE id  0
 
  AND record_type_id=0
 
  AND id=${dataimporter.delta.id}
 
  ORDER BY id DESC
  deltaQuery=SELECT id, date_published, date_created, publish_flag FROM
 Item
  WHERE id  0
 
  AND record_type_id=0
 
  AND sys_time_stamp =
 
  SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id

  DESC
 
  I think the problem i'm having comes from the way solr stores the
  last_index_time in conf/dataimport.properties as stated on the wiki as
 
  When delta-import command is executed, it reads the start time stored
 in
  conf/dataimport.properties. It uses that timestamp to run delta queries
 and
  after completion, updates the timestamp in conf/dataimport.properties.
 
  Which to me seems to indicate that any records with a time-stamp between
  when the dataimport starts and ends will be missed as the last_index_time
 is
  set to when it completes the import.
 
  This doesn't seem quite right to me. I would have expected the
  last_index_time to refer to when the dataimport was last STARTED so that
  there was no gaps in the timestamp covered.
 
  I changed the deltaQuery of our config to include the SUBDATE by INTERVAL
 1
  MINUTE statement to alleviate this problem, but it does only cover times
  when the delta-import takes less than a minute.
 
  Any ideas as to how this can be overcome? ,other than increasing the
  INTERVAL to something larger.
 
  Regards
 
  Barry Tucker
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.htmlhttp://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html?by-user=t
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Lance Norskog
 [hidden email] http://user/SendEmail.jtp?type=nodenode=2307215i=1


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2307215.html
  To unsubscribe from Delta Import occasionally missing records., click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2300877code=YnR1Y2tlckBtaW50ZWwuY29tfDIzMDA4Nzd8LTEzMDE5MDUxOTI=.



font size=1 face=Verdana

Mintel International Group Ltd | 18-19 Long Lane | London EC1A 9PL UK
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged, or otherwise protected
under applicable law. Unauthorised disclosure, copying, distribution, or use of 
the contents is prohibited 
and may be unlawful. If you have received this email in error, including 
without appropriate authorisation, 
then please reply to the sender about the error and delete this email and any 
attachments./font

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2318572.html
Sent from the Solr - User mailing list archive at Nabble.com.


please help Problem with dataImportHandler

2011-01-24 Thread Dinesh

this is the error that i'm getting.. no idea of what is it..


/apache-solr-1.4.1/example/exampledocs# java -jar post.jar sample.txt 
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file sample.txt
SimplePostTool: FATAL: Solr returned an error:
Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_null___orgapachesolrhandlerdataimportDataImportHandlerException_Exception_occurred_while_initializing_context__at_orgapachesolrhandlerdataimportDataImporterloadDataConfigDataImporterjava190__at_orgapachesolrhandlerdataimportDataImporterinitDataImporterjava101__at_orgapachesolrhandlerdataimportDataImportHandlerinformDataImportHandlerjava113__at_orgapachesolrcoreSolrResourceLoaderinformSolrResourceLoaderjava508__at_orgapachesolrcoreSolrCoreinitSolrCorejava588__at_orgapachesolrcoreCoreContainer$InitializerinitializeCoreContainerjava137__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava83__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500__at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhan
root@karunya-desktop:/home/karunya/apache-solr-1.4.1/example/exampledocs# 


-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318585.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: please help Problem with dataImportHandler

2011-01-24 Thread Ezequiel Calderara
This may be a dumb question, but Is the xml encoded in UTF-8?

On Mon, Jan 24, 2011 at 7:08 AM, Dinesh mdineshkuma...@karunya.edu.inwrote:


 this is the error that i'm getting.. no idea of what is it..


 /apache-solr-1.4.1/example/exampledocs# java -jar post.jar sample.txt
 SimplePostTool: version 1.2
 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
 other encodings are not currently supported
 SimplePostTool: POSTing files to http://localhost:8983/solr/update..
 SimplePostTool: POSTing file sample.txt
 SimplePostTool: FATAL: Solr returned an error:

 Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_null___orgapachesolrhandlerdataimportDataImportHandlerException_Exception_occurred_while_initializing_context__at_orgapachesolrhandlerdataimportDataImporterloadDataConfigDataImporterjava190__at_orgapachesolrhandlerdataimportDataImporterinitDataImporterjava101__at_orgapachesolrhandlerdataimportDataImportHandlerinformDataImportHandlerjava113__at_orgapachesolrcoreSolrResourceLoaderinformSolrResourceLoaderjava508__at_orgapachesolrcoreSolrCoreinitSolrCorejava588__at_orgapachesolrcoreCoreContainer$InitializerinitializeCoreContainerjava137__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava83__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500__at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhan
 root@karunya-desktop:/home/karunya/apache-solr-1.4.1/example/exampledocs#


 -
 DINESHKUMAR . M
 I am neither especially clever nor especially gifted. I am only very, very
 curious.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318585.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: please help Problem with dataImportHandler

2011-01-24 Thread Dinesh

actually its a log file i seperately created an handler for that... its not
XML

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine

Thanks Em,

How i can calculate index time, update time and space disk used by one 
taxonomy ?


Le 24/01/2011 10:58, Em a écrit :

100 Entries per taxon?
Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons.
If your indexed taxon-versions are looking okay, you could leave out the
DB-overhead and could do everything in Solr.






How data is replicating from Master to Slave?

2011-01-24 Thread dhanesh

Hi,
I'm currently facing an issue with SOLR (exactly with the slaves
replication) and after having spent quite a few time reading online I
find myself having to ask for some enlightenment.
To be more factual, here is the context that led me to this question.
If the website administrator edited  an existing category name, then I
need to re-index all the documents with the newly edited category.
Suppose the category is linked with more than 10 million records.I need
to re-index all the 10 million documents in SOLR

In the case of MySQL it should be like master server writes updates to
its binary log files and maintains an index of those files.These binary
log files serve as a record of updates to be sent to slave servers.
My doubt is in SOLR how the data is replicating from Master to  Slave?
I'd like to know the internal process of data replication.
Is that huge amount of data(10 million records) is copying from Master
to slave?
This is my first work with Solr. So I'm not sure how to tackle this issue.

Regds
dhanesh s.r



fieldType textgen. tokens 2

2011-01-24 Thread stockii

Hello.

my field sender with fieldType=textgen cannot find any documents wich are
more than 2 tokens long.

-q=sender:name1 name2 name3 = 0 Documents found

WHY ???

that is my field (original from default schema.xml)

fieldType name=textgen class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

-
--- System


One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
- Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2318775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fieldType textgen. tokens 2

2011-01-24 Thread Markus Jelsma
This is not the fieldType but your query that is giving you trouble. You only 
specify fieldName for value name1, so Solr will use defaultField for values 
name2 and name3. You also omitted an operator, so Solr will use 
defaultOperator instead. See you schema.xml for the values for the defaults 
and use debugQuery=true to, well, debug queries.

On Monday 24 January 2011 11:48:07 stockii wrote:
 Hello.
 
 my field sender with fieldType=textgen cannot find any documents wich are
 more than 2 tokens long.
 
 -q=sender:name1 name2 name3 = 0 Documents found
 
 WHY ???
 
 that is my field (original from default schema.xml)
 
 fieldType name=textgen class=solr.TextField
 positionIncrementGap=100 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=0/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=0/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
 
 -
 --- System
 
 
 One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
 1 Core with 31 Million Documents other Cores  100.000
 
 - Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
 - Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: How data is replicating from Master to Slave?

2011-01-24 Thread Markus Jelsma
It's all explained on the wiki:
http://wiki.apache.org/solr/SolrReplication#How_does_the_slave_replicate.3F


On Monday 24 January 2011 11:25:45 dhanesh wrote:
 Hi,
 I'm currently facing an issue with SOLR (exactly with the slaves
 replication) and after having spent quite a few time reading online I
 find myself having to ask for some enlightenment.
 To be more factual, here is the context that led me to this question.
 If the website administrator edited  an existing category name, then I
 need to re-index all the documents with the newly edited category.
 Suppose the category is linked with more than 10 million records.I need
 to re-index all the 10 million documents in SOLR
 
 In the case of MySQL it should be like master server writes updates to
 its binary log files and maintains an index of those files.These binary
 log files serve as a record of updates to be sent to slave servers.
 My doubt is in SOLR how the data is replicating from Master to  Slave?
 I'd like to know the internal process of data replication.
 Is that huge amount of data(10 million records) is copying from Master
 to slave?
 This is my first work with Solr. So I'm not sure how to tackle this issue.
 
 Regds
 dhanesh s.r

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: fieldType textgen. tokens 2

2011-01-24 Thread stockii

that is my query:  q=sender:name1+name2+name3
exaclty the request is:
q=sender:(name1+name2+name3+OR+sender_2:name1+name2+name3)

so solr is using another field for name2 and name3 ?

debugging cannot help me, or i dont understand the debugging ... 
when i search only for name1 + name2 search is okay. but with name3 not
... 
in my test-enironment i used the same fieldType but it works fine...

-
--- System


One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
- Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2318865.html
Sent from the Solr - User mailing list archive at Nabble.com.


Migrating from 1.4.0 to 1.4.1 solr

2011-01-24 Thread Prasad Joshi
Hi,
I want to migrate from 1.4.0 to 1.4.1 . Tried keeping the same conf for the
cores as in 1.4.0, added the relevant core names in solr.xml and restarted
solr but the old cores dont show up on the browser localhost:8983. There
were a few cores in examples/multicore/ in the solr1.4.1 source from where I
downloaded, these cores when included in solr.xml do show up on the browser.

Pl do let me know the reason. Is there anything I need to do for the core
migration? I dont have any data in these cores. Also if there was data is
there a nice way of migrating from 1.4.0 to 1.4.1 (Which does not involve
reindexing) ?
Regards,
Prasad


Re: Migrating from 1.4.0 to 1.4.1 solr

2011-01-24 Thread Markus Jelsma
We can't guess what's wrong with the cores but you need to reindex anyway:
http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/CHANGES.txt

On Monday 24 January 2011 12:06:10 Prasad Joshi wrote:
 Hi,
 I want to migrate from 1.4.0 to 1.4.1 . Tried keeping the same conf for the
 cores as in 1.4.0, added the relevant core names in solr.xml and restarted
 solr but the old cores dont show up on the browser localhost:8983. There
 were a few cores in examples/multicore/ in the solr1.4.1 source from where
 I downloaded, these cores when included in solr.xml do show up on the
 browser.
 
 Pl do let me know the reason. Is there anything I need to do for the core
 migration? I dont have any data in these cores. Also if there was data is
 there a nice way of migrating from 1.4.0 to 1.4.1 (Which does not involve
 reindexing) ?
 Regards,
 Prasad

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr with Unknown Lucene Index?

2011-01-24 Thread Lee Goddard
Having found some code that searches a Lucene index, the only analyzers 
referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.


How can I map this is Solr? The example schema doesn't seem to mention 
this, and specifying 'text' or 'string' for every field doesn't seem to 
help.


Thanks
Lee

On 22/01/2011 21:50, Erick Erickson wrote:
Sorry, I was out of town for a while. Luke just reads stuff, it 
doesn't try to interpret any schema.
Solr makes certain assumptions about what *should* be in the index 
based on the schema.
So getting Solr to just use a Lucene index really involves knowing 
that Lucene used, say,
a StandardAnalyzer followed by a LowerCaseFilter followed by for some 
field And there's

no way I know of to find that information out from a raw Lucene index.

If you don't get things to match, your results will...er...vary. But 
perhaps you can guess

well enough to make it work, although upgrading will be a problem.

I really think your effort would be best spent finding the original 
indexing or querying
code if at all possible and seeing the way that code defined the 
analysis chain (in the
code) for each fields and using that as a basis for creating a close 
enough schema.



Best
Erick

On Thu, Jan 20, 2011 at 3:59 AM, Lee Goddard lee...@gmail.com 
mailto:lee...@gmail.com wrote:


Thanks, Erick. I think my question comes down to, 'how does Luke
know how to read the indexes?' I will try the Luke mailing list.

Cheers
Lee


On 19/01/2011 17:49, Erick Erickson wrote:

I don't really think this is possible/reasonable. There's nothing
fixed about
a Lucene index, you could index a field in different documents
with any
number of analysis chains. The tricky part here will, as you've
discovered,
find a way to match the Solr schema closely enough to get your
desired
results.

Are you sure there's no way to re-index the data? Or find the
original code
that indexed it?

Best
Erick

On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard lee...@gmail.com
mailto:lee...@gmail.com wrote:

I have to use some Lucene indexes, and Solr looks like the
perfect solution.

However, all I know about the Lucene indexes are what Luke
tells me, and simply setting the schema to represent all
fields as text does not seem to be working -- though as this
is my first Solr, I am not sure if that is due to some other
issue.

Is there some way to ascertain how the Solr schema should
describe the Lucene fields?

Many thanks in anticipation
Lee






Re: Taxonomy in SOLR

2011-01-24 Thread Em

Hi Daniem,

ahm, the formula I wrote was no definitive guide, just some numbers I
combined to visualize the amount of data - perhaps not even a complete
formula.

Well, when you can use your taxonomy as indexed-only you do not double the
used disk space when you are indexing two equal documents.

Lucene - and also Solr - are working with an inverted index: This means
every document is mapped against its indexed terms.
So your index-size will depend on the number of unique taxonomy-terms and
the pointers of the documents to these terms. That's it. Usually the used
disk-space for an index is much smaller than the size of the original data.

I hope what I tried to explain was easy to understand.

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2319202.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: please help Problem with dataImportHandler

2011-01-24 Thread Ezequiel Calderara
And what the logs says about it?

On Mon, Jan 24, 2011 at 7:15 AM, Dinesh mdineshkuma...@karunya.edu.inwrote:


 actually its a log file i seperately created an handler for that... its not
 XML

 -
 DINESHKUMAR . M
 I am neither especially clever nor especially gifted. I am only very, very
 curious.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318617.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: please help Problem with dataImportHandler

2011-01-24 Thread Dinesh

its a DHCP log.. i want ti index it

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2319627.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine



Le 24/01/2011 13:10, Em a écrit :

Hi Daniem,

ahm, the formula I wrote was no definitive guide, just some numbers I
combined to visualize the amount of data - perhaps not even a complete
formula.

Well, when you can use your taxonomy as indexed-only you do not double the
used disk space when you are indexing two equal documents.
So, five document or 4 mi with the same taxonomy are equal in using disk 
space to one ?



Lucene - and also Solr - are working with an inverted index: This means
every document is mapped against its indexed terms.
So your index-size will depend on the number of unique taxonomy-terms and
the pointers of the documents to these terms. That's it. Usually the used
disk-space for an index is much smaller than the size of the original data.

I hope what I tried to explain was easy to understand.

Thanks, it's very helpfull !

How i can find more explaination on the internal structure of the Lucene 
indexer ?


Damien


Re: please help Problem with dataImportHandler

2011-01-24 Thread Ezequiel Calderara
I mean, when you run the DIH, what's the output of the Solr Log ? Probably
there is more info about whats happening...
On Mon, Jan 24, 2011 at 10:28 AM, Dinesh mdineshkuma...@karunya.edu.inwrote:


 its a DHCP log.. i want ti index it

 -
 DINESHKUMAR . M
 I am neither especially clever nor especially gifted. I am only very, very
 curious.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2319627.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: Indexing spatial columns

2011-01-24 Thread Adam Estrada
Hi MapButcher,

There are a couple things that are going on here. 

1. The spatial functionality is confusing between versions of Solr. I wish 
someone would update the solr Spatial Search wiki page. 
2.  You will want to use the jTDS Driver here instead of the one from 
Microsoft. http://jtds.sourceforge.net/ It works a little better.
3.  For Solr 4.0 you will basically have to concatenate the lat/long fields in 
to a single column which in the example schema is called store
4. I don't know if individual columns actually exist for latitude and longitude 
in 4.0 but in 1.4.x I know the lat/long type HAD to be called lat and lng and 
had to be tdouble type which I see below.
5. Revert back to Solr 1.4.x and try using their plugin 
http://www.jteam.nl/news/spatialsolr.html
6. Try your queries in the Solr admin tool first before trying to integrate 
this in to your code.

Overall, I have had great success with Solr Spatial in just doing a simple 
radius search. I am using the core 4.0 functionality and am having no problems. 
I will eventually get in to distance and bounding box queries do ehstever you 
figure out and share would be great!

Good luck,
Adam

On Jan 24, 2011, at 4:46 AM, mapbutcher wrote:

 
 Hi,
 
 I'm a bit of a solr beginner. I have installed Solr 4.0 and I'm trying to
 index some spatial data stored in a sql server instance. I'm using the
 DataImportHandler here is my data-comfig.xml:
 
 dataConfig
 dataSource type=JdbcDataSource
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://localhost\sqlserver08;databaseName=Spatial user=sa
 password=sqlserver08/
  document
entity name=poi query=select OBJECTID,CATEGORY,NAME,POINT_X,POINT_Y
 from NZ_POI
   field column=OBJECTID name=id/
   field column=CATEGORY name=category/
   field column=NAME name=name/
   field column=POINT_X name=lat/
   field column=POINT_Y name=lon/
   /entity
  /document
 /dataConfig
 
 In my schema file I have following definition:
 
   field name=category type=string indexed=true stored=true/
   field name=name type=string indexed=true stored=true/
   field name=lat type=tdouble indexed=true stored=true/   
   field name=lon type=tdouble indexed=true stored=true/
 
   copyField source=category dest=text/
   copyField source=name dest=text/
 
 I have completed a data import with no errors in the log as far as i can
 tell. However when i inspect the schema i do not see the columns names
 lat\lon. When sending the query:
 
 http://localhost:8080/Solr/select/?q=Camp AND _val_:recip(dist(2, lon, lat,
 44.794, -93.2696), 1, 1, 0)^100 
 
 I get an error undefined column. 
 
 Does anybody have any ideas about whether the above is the correct procedure
 for indexing spatial data?
 
 Cheers
 
 S
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Indexing-spatial-columns-tp2318493p2318493.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Taxonomy in SOLR

2011-01-24 Thread Em

Just for illustration:

This is your original data:

doc1 : hello world
doc2: hello daniem
doc3: hello pal

Now, Lucene produces something like this from the input:
hello: id_doc1,id_doc2,id_doc3
daniem: id_doc2
pal: id_doc3

Well, it's more complex, but enough for illustration.
As you can see, the representation of a document is completly different.
A document costs only a few bytes for a Lucene-internal id per word.

If words occur more than one time per document AND you do not store
termVectors, Lucene just adds the number of occurences per word per doc to
its index:

hello: id_doc1[1],id_doc2[1],id_doc3[1]
daniem: id_doc2[1]
pal: id_doc3[1]

Imagine what happens to longer texts where especially stopwords or important
words occur more than one time.

I would suggest to start with the Lucene-Wiki, if you want to learn more
about Lucene.

Regards,
Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2319920.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fieldType textgen. tokens 2

2011-01-24 Thread Erick Erickson
You need to get more familiar with debugging, spending the time on it
is well worth the effort.

But assuming the '+' in your pasted query are really URL-encoded spaces
your syntax is really confused.

sender:(name1 name2 name3 OR sender_2:name1 name2 name3)

It *looks* like you intend something like
sender:(name1 name2 name3) OR sender_2:(name1 name2 name3)

note the added parentheses.

Best
Erick


On Mon, Jan 24, 2011 at 6:04 AM, stockii stock.jo...@googlemail.com wrote:


 that is my query:  q=sender:name1+name2+name3
 exaclty the request is:
 q=sender:(name1+name2+name3+OR+sender_2:name1+name2+name3)

 so solr is using another field for name2 and name3 ?

 debugging cannot help me, or i dont understand the debugging ...
 when i search only for name1 + name2 search is okay. but with name3 not
 ...
 in my test-enironment i used the same fieldType but it works fine...

 -
 --- System
 

 One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
 1 Core with 31 Million Documents other Cores  100.000

 - Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
 - Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2318865.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: fieldType textgen. tokens 2

2011-01-24 Thread stockii

i got this query from the mailing list.

but i found the problem. wrong query. i dont know why i construct my query
so ... =(



but thanks for your help =) 


-
--- System


One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
- Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2320121.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: one last questoni on dynamic fields

2011-01-24 Thread Stefan Matheis
Yes, you can =) Prefix  Suffix, both is working fine

On Sun, Jan 23, 2011 at 9:54 PM, Geert-Jan Brits gbr...@gmail.com wrote:

 Yep you can. Although I'm not sure you can use a wildcard-prefix. (perhaps
 you can I'm just not sure) . I always use wildcard-suffixes.

 Cheers,
 Geert-Jan

 2011/1/23 Dennis Gearon gear...@sbcglobal.net

  Is it possible to use ONE definition of a dynamic field type for
 inserting
  mulitple dynamic fields of that type with different names? Or do I need a
  seperate dynamic field definition for each eventual field?
 
  Can I do this?
  in schema.xml
   field name=ALL_OTHER_STANDARD_FILEDS type=OTHER_TYPES
  indexed=SOME_TIMES stored=USUALLY/
   dynamicField name=*_i  type=intindexed=true  stored=true/
   .
   .
  /in schema.xml
 
 
  and then doing for insert
  add
  doc
   field name=ALL_OTHER_STANDARD_FILEDSall their valuesfield
   field name=customA_i9802490824908field
   field name=customB_i9809084field
   field name=customC_i09845970011field
   field name=customD_i09874523459870field
  /doc
  /add
 
   Dennis Gearon
 
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually a
  better
  idea to learn from others’ mistakes, so you do not have to make them
  yourself.
  from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 
  EARTH has a Right To Life,
  otherwise we all die.
 
 



Re: Taxonomy in SOLR

2011-01-24 Thread Erick Erickson
First, the redundancy is certainly there, but that's what Solr does, handles
large
amounts of data. 4 million documents is actually a pretty small corpus by
Solr
standards, so you may well be able to do exactly what you propose with
acceptable performance/size. I'd advise just trying it with, say, 200,000
docs.
Why 200K? because index growth is non-linear with the first bunch of
documents
taking up more space than the second. So index 100K, examine your indexes
and index 100K more. Now use the delta to extrapolate to 4M.

You don't need to store the taxonomy in each doc for auto-complete, you can
get your auto-completion from a different index. Or you can index your
taxonomies
in a special document in Solr and query the (unique) field in that
document for
autocomplete.

For faceting, you do need taxonomies. But remember that the nature of the
inverted index is that unique terms are only stored once, and the document
ID for each document that that term appears in is recorded. So if you have
3/europe/germany/berlin stored in 1M documents, your index space is really
string length + overhead + space for 1M ids.

Best
Erick

On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine dfonta...@rosebud.frwrote:

 Yes, i am not obliged to store taxonomies.

 My taxonomies are type of

 english_taxon_label = Berlin
 english_taxon_type = location
 english_taxon_hierarchy = 0/world
  1/world/europe
  2/world/europe/germany
  3/world/europe/germany/berlin

 I need *_taxon_hierarchy to faceting and label to auto complete.

 With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
 million documents the redundandcy is huge, no ?

 And i have 10 different taxonomies per document 

 Damien

 Le 24/01/2011 10:30, Em a écrit :

  Hi Damien,

 why are you storing the taxonomies?
 When it comes to faceting, it only depends on indexed values. If there is
 a
 meaningful difference between the indexed and the stored value, I would
 prefer to use an RDBMs or something like that to reduce redundancy.

 Does this help?

 Regards





Re: searching based on grouping result

2011-01-24 Thread Steve Fuchs
Hi Thanks for the response.

I didn't explain myself well, I am using the field collapsing and things are 
working as that page describes.

I think my problem is that as well as field collapsing works, solr is still 
just returning a list of documents. There don't seem to be any operations I can 
do on collapsed groups as a whole. They are more of a display thing that can't 
be referenced in the query. Same thing with facets? Am I right in this?

steve



thansk again
steve

On Jan 22, 2011, at 12:53 AM, Otis Gospodnetic wrote:

 Steve,
 
 Does http://wiki.apache.org/solr/FieldCollapsing do what you need?
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
 From: Steve Fuchs st...@aps.org
 To: solr-user@lucene.apache.org
 Sent: Fri, January 21, 2011 3:05:32 PM
 Subject: searching based on grouping result
 
 Hello All,
 
 MY index documents represent a set of papers each with an  author id and the 
 id 
 of the referee that reviewed the paper.
 
 I also end  up with a field in each document that tells me whether the 
 referee 
 still has the  paper, but has not graded it. THis can be a boolean.
 
 In my final result I  want to collapse the result by referee number and omit 
 any referee that has this  boolean true, it doesn't matter how many 
 documents 
 they have with the field set  to false.
 
 Is there a way to set my query to honor the results of the  grouping (or of 
 a 
 facet?) as in q:  -referee_number.open_flag:*
 
 ?
 
 
 Thanks in advance.
 
 steve
 



Re: Multicore Relaod Theoretical Question

2011-01-24 Thread Alexander Kanarsky
Em,

that's correct. You can use 'lsof' to see file handles still in use.
See 
http://0xfe.blogspot.com/2006/03/troubleshooting-unix-systems-with-lsof.html,
Recipe #11.

-Alexander

On Sun, Jan 23, 2011 at 1:52 AM, Em mailformailingli...@yahoo.de wrote:

 Hi Alexander,

 thank you for your response.

 You said that the old index files were still in use. That means Linux does
 not *really* delete them until Solr frees its locks from it, which happens
 while reloading?



 Thank you for sharing your experiences!

 Kind regards,
 Em


 Alexander Kanarsky wrote:

 Em,

 yes, you can replace the index (get the new one into a separate folder
 like index.new and then rename it to the index folder) outside the
 Solr, then just do the http call to reload the core.

 Note that the old index files may still be in use (continue to serve
 the queries while reloading), even if the old index folder is deleted
 - that is on Linux filesystems, not sure about NTFS.
 That means the space on disk will be freed only when the old files are
 not referenced by Solr searcher any longer.

 -Alexander

 On Sat, Jan 22, 2011 at 1:51 PM, Em mailformailingli...@yahoo.de wrote:

 Hi Erick,

 thanks for your response.

 Yes, it's really not that easy.

 However, the target is to avoid any kind of master-slave-setup.

 The most recent idea i got is to create a new core with a data-dir
 pointing
 to an already existing directory with a fully optimized index.

 Regards,
 Em
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2310709.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2312778.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Faceting Question

2011-01-24 Thread beaviebugeater

I am attempting to do facets on products similar to how hayneedle does it on
their online stores (they do NOT use Solr).   See:
http://www.clockstyle.com/wall-clocks/antiqued/1359+1429+4294885075.cfm

So simple example, my left nav might contain categories and 2 attributes,
brand and capacity:

Categories
- Cat1 (23) selected
- Cat2 (16)
- Cat3 (5)

Brand
-Brand1 (18)
-Brand2 (10)
-Brand3 (0)

Capacity
-Capacity1 (14)
-Capacity2 (9)


Each category or attribute value is represented with a checkbox and can be
selected or deselected.

The initial entry into this page has one category selected.  Other
categories can be selected which might change the number of products related
to each attribute value.  The number of products in each category never
changes.

I should also be able to select one or more attribute.  

Logically this would look something like:

(Cat1 Or Cat2) AND (Value1 OR Value2) AND (Value4)

Behind the scenes I have each category and attribute value represented by a
tag, which is just a numeric value.  So I search on the tags field only
and then facet on category, brand and capacity fields which are stored
separately.  

My current Solr query ends up looking something like:

fq={!tag=tag1}tags:( |1003| |1007|) AND tags:(
|10015|)version=2.2start=0rows=10indent=onfacet=onfacet.field={!ex=tag1}categoryfacet.field=capacityfacet.field=brand

This shows 2 categories being selected (1003 and 1007) and one attribute
value (10015). 

This partially works - the categories work fine.   The problem is, if I
select, say a brand attribute (as in the above example the 10015 tag) it
does filter to the selected categories AND the selected attribute BUT I'm
not able to broaden the search by selecting another attribute value.  

I want to display of products to be filtered to what I select, but I want to
be able to broaden the filter without having to back up.  

I feel like I'm close but still missing something.  Is there a way to
specify 2 tags that should be excluded from facet fields?

I hope this example makes sense.

Any help greatly appreciated.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-Question-tp2320542p2320542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine

Thanks Em and Erick for your answers,

Now, i better understand functioning of Solr.

Damien

Le 24/01/2011 16:23, Erick Erickson a écrit :

First, the redundancy is certainly there, but that's what Solr does, handles
large
amounts of data. 4 million documents is actually a pretty small corpus by
Solr
standards, so you may well be able to do exactly what you propose with
acceptable performance/size. I'd advise just trying it with, say, 200,000
docs.
Why 200K? because index growth is non-linear with the first bunch of
documents
taking up more space than the second. So index 100K, examine your indexes
and index 100K more. Now use the delta to extrapolate to 4M.

You don't need to store the taxonomy in each doc for auto-complete, you can
get your auto-completion from a different index. Or you can index your
taxonomies
in a special document in Solr and query the (unique) field in that
document for
autocomplete.

For faceting, you do need taxonomies. But remember that the nature of the
inverted index is that unique terms are only stored once, and the document
ID for each document that that term appears in is recorded. So if you have
3/europe/germany/berlin stored in 1M documents, your index space is really
string length + overhead  +space for 1M ids.

Best
Erick

On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontainedfonta...@rosebud.frwrote:


Yes, i am not obliged to store taxonomies.

My taxonomies are type of

english_taxon_label = Berlin
english_taxon_type = location
english_taxon_hierarchy = 0/world
  1/world/europe
  2/world/europe/germany
  3/world/europe/germany/berlin

I need *_taxon_hierarchy to faceting and label to auto complete.

With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
million documents the redundandcy is huge, no ?

And i have 10 different taxonomies per document 

Damien

Le 24/01/2011 10:30, Em a écrit :

  Hi Damien,

why are you storing the taxonomies?
When it comes to faceting, it only depends on indexed values. If there is
a
meaningful difference between the indexed and the stored value, I would
prefer to use an RDBMs or something like that to reduce redundancy.

Does this help?

Regards







Re: Taxonomy in SOLR

2011-01-24 Thread Em

Hi Erick,

in some usecases I really think that your suggestion with some
unique-documents for meta-information is a good approach to solve some
issues.
However there is a hurdle for me and maybe you can help me to clear it:

What is the best way to get such meta-data?
I see three possible approaches:
1st: get it in another request
2nd: get it with a requestHandler
3rd: get it with a searchComponent

I think the 2nd and 3rd are the cleanest ways.
But to make a decision between them I run into two problems:
RequestHandler: Should I extend the StandardRequestHandler to do what I
need? If so, I could just query my index for the needed information and add
it to the request before I pass it up the SearchComponents.

SearchComponent: The problem with the SearchComponent is the distributed
thing and how to test it. However, if this would be the cleanest way to go,
one should go it.

What would you do, if you want to add some meta-information to your request
that was not given by the user?

Regards,
Em


Erick Erickson wrote:
 
 First, the redundancy is certainly there, but that's what Solr does,
 handles
 large
 amounts of data. 4 million documents is actually a pretty small corpus by
 Solr
 standards, so you may well be able to do exactly what you propose with
 acceptable performance/size. I'd advise just trying it with, say, 200,000
 docs.
 Why 200K? because index growth is non-linear with the first bunch of
 documents
 taking up more space than the second. So index 100K, examine your indexes
 and index 100K more. Now use the delta to extrapolate to 4M.
 
 You don't need to store the taxonomy in each doc for auto-complete, you
 can
 get your auto-completion from a different index. Or you can index your
 taxonomies
 in a special document in Solr and query the (unique) field in that
 document for
 autocomplete.
 
 For faceting, you do need taxonomies. But remember that the nature of the
 inverted index is that unique terms are only stored once, and the document
 ID for each document that that term appears in is recorded. So if you have
 3/europe/germany/berlin stored in 1M documents, your index space is really
 string length + overhead + space for 1M ids.
 
 Best
 Erick
 
 On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
 dfonta...@rosebud.frwrote:
 
 Yes, i am not obliged to store taxonomies.

 My taxonomies are type of

 english_taxon_label = Berlin
 english_taxon_type = location
 english_taxon_hierarchy = 0/world
  1/world/europe
  2/world/europe/germany
 
 3/world/europe/germany/berlin

 I need *_taxon_hierarchy to faceting and label to auto complete.

 With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
 million documents the redundandcy is huge, no ?

 And i have 10 different taxonomies per document 

 Damien

 Le 24/01/2011 10:30, Em a écrit :

  Hi Damien,

 why are you storing the taxonomies?
 When it comes to faceting, it only depends on indexed values. If there
 is
 a
 meaningful difference between the indexed and the stored value, I would
 prefer to use an RDBMs or something like that to reduce redundancy.

 Does this help?

 Regards



 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting Question

2011-01-24 Thread Geert-Jan Brits
 fq={!tag=tag1}tags:( |1003| |1007|) AND tags:(
|10015|)version=2.2start=0rows=10indent=onfacet=onfacet.field={!ex=tag1}categoryfacet.field=capacityfacet.field=brand

I'm just guessing here, but perhaps {!tag=tag1} is only picking up the 'tags:(
|1003| |1007|) '-part. If so {!ex=tag1} would only exclude 'tags:( |1003|
|1007|) ' but it wouldn't exclude ' tags:(
|10015|)'

I believe this would 100% explain what you're seeing.

Assuming my guess is correct you could try to a couple of things (none of
which I'm absolutely certain will work, but you could try it out easily):
1. put fq in quotes: fq={!tag=tag1}tags:( |1003| |1007|) AND tags:(|10015|)
 -- this might instruct {!tag=tag1} to tag the whole fq-filter.
2. make multiple fq's, and exclude them all (not sure if you can exclude
multiple fields): fq={!tag=tag1}tags:( |1003| |1007|)fq={!tag=tag2}tags:(
|10015|)facet.field={!ex=tag1,tag2}category...

hth,
Geert-Jan

2011/1/24 beaviebugeater mbro...@cox.net


 I am attempting to do facets on products similar to how hayneedle does it
 on
 their online stores (they do NOT use Solr).   See:
 http://www.clockstyle.com/wall-clocks/antiqued/1359+1429+4294885075.cfm

 So simple example, my left nav might contain categories and 2 attributes,
 brand and capacity:

 Categories
 - Cat1 (23) selected
 - Cat2 (16)
 - Cat3 (5)

 Brand
 -Brand1 (18)
 -Brand2 (10)
 -Brand3 (0)

 Capacity
 -Capacity1 (14)
 -Capacity2 (9)


 Each category or attribute value is represented with a checkbox and can be
 selected or deselected.

 The initial entry into this page has one category selected.  Other
 categories can be selected which might change the number of products
 related
 to each attribute value.  The number of products in each category never
 changes.

 I should also be able to select one or more attribute.

 Logically this would look something like:

 (Cat1 Or Cat2) AND (Value1 OR Value2) AND (Value4)

 Behind the scenes I have each category and attribute value represented by a
 tag, which is just a numeric value.  So I search on the tags field only
 and then facet on category, brand and capacity fields which are stored
 separately.

 My current Solr query ends up looking something like:

 fq={!tag=tag1}tags:( |1003| |1007|) AND tags:(

 |10015|)version=2.2start=0rows=10indent=onfacet=onfacet.field={!ex=tag1}categoryfacet.field=capacityfacet.field=brand

 This shows 2 categories being selected (1003 and 1007) and one attribute
 value (10015).

 This partially works - the categories work fine.   The problem is, if I
 select, say a brand attribute (as in the above example the 10015 tag) it
 does filter to the selected categories AND the selected attribute BUT I'm
 not able to broaden the search by selecting another attribute value.

 I want to display of products to be filtered to what I select, but I want
 to
 be able to broaden the filter without having to back up.

 I feel like I'm close but still missing something.  Is there a way to
 specify 2 tags that should be excluded from facet fields?

 I hope this example makes sense.

 Any help greatly appreciated.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Faceting-Question-tp2320542p2320542.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Taxonomy in SOLR

2011-01-24 Thread Erick Erickson
I wasn't thinking about this for adding information to the *request*.
Rather, in this
case the autocomplete uses an Ajax call that just uses the TermsComponent
to get the autocomplete data and display it. This is just textual, so adding
it to the
request is client-side magic.

If you want your app to have access to the meta-data for other purposes,
you'd
just query and cache it from the app. You could use that to build up the
links
you embed in the page for new queries if you chose, no custom handlers
necessary.

Otherwise, I guess you'd create a custom request handler, that seems like a
reasonable place.

Best
Erick

On Mon, Jan 24, 2011 at 11:03 AM, Em mailformailingli...@yahoo.de wrote:


 Hi Erick,

 in some usecases I really think that your suggestion with some
 unique-documents for meta-information is a good approach to solve some
 issues.
 However there is a hurdle for me and maybe you can help me to clear it:

 What is the best way to get such meta-data?
 I see three possible approaches:
 1st: get it in another request
 2nd: get it with a requestHandler
 3rd: get it with a searchComponent

 I think the 2nd and 3rd are the cleanest ways.
 But to make a decision between them I run into two problems:
 RequestHandler: Should I extend the StandardRequestHandler to do what I
 need? If so, I could just query my index for the needed information and add
 it to the request before I pass it up the SearchComponents.

 SearchComponent: The problem with the SearchComponent is the distributed
 thing and how to test it. However, if this would be the cleanest way to go,
 one should go it.

 What would you do, if you want to add some meta-information to your request
 that was not given by the user?

 Regards,
 Em


 Erick Erickson wrote:
 
  First, the redundancy is certainly there, but that's what Solr does,
  handles
  large
  amounts of data. 4 million documents is actually a pretty small corpus by
  Solr
  standards, so you may well be able to do exactly what you propose with
  acceptable performance/size. I'd advise just trying it with, say, 200,000
  docs.
  Why 200K? because index growth is non-linear with the first bunch of
  documents
  taking up more space than the second. So index 100K, examine your indexes
  and index 100K more. Now use the delta to extrapolate to 4M.
 
  You don't need to store the taxonomy in each doc for auto-complete, you
  can
  get your auto-completion from a different index. Or you can index your
  taxonomies
  in a special document in Solr and query the (unique) field in that
  document for
  autocomplete.
 
  For faceting, you do need taxonomies. But remember that the nature of the
  inverted index is that unique terms are only stored once, and the
 document
  ID for each document that that term appears in is recorded. So if you
 have
  3/europe/germany/berlin stored in 1M documents, your index space is
 really
  string length + overhead + space for 1M ids.
 
  Best
  Erick
 
  On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
  dfonta...@rosebud.frwrote:
 
  Yes, i am not obliged to store taxonomies.
 
  My taxonomies are type of
 
  english_taxon_label = Berlin
  english_taxon_type = location
  english_taxon_hierarchy = 0/world
   1/world/europe
   2/world/europe/germany
 
  3/world/europe/germany/berlin
 
  I need *_taxon_hierarchy to faceting and label to auto complete.
 
  With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
  million documents the redundandcy is huge, no ?
 
  And i have 10 different taxonomies per document 
 
  Damien
 
  Le 24/01/2011 10:30, Em a écrit :
 
   Hi Damien,
 
  why are you storing the taxonomies?
  When it comes to faceting, it only depends on indexed values. If there
  is
  a
  meaningful difference between the indexed and the stored value, I would
  prefer to use an RDBMs or something like that to reduce redundancy.
 
  Does this help?
 
  Regards
 
 
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH serialize

2011-01-24 Thread Stefan Matheis
Hi Rich,

i'm a bit confused after reading your post .. what exactly you wanna try to
achieve? Serializing (like http://php.net/serialize) your complete row into
one field? Don't wanna search in them, just store and deliver them in your
results? Does that make sense? Sounds a bit strange :)

Regards
Stefan

On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard ccode...@gmail.com wrote:

 Hi Dennis,

  thank you for your answer, but didn't understand why you say it doesn't
 need serialization. I'm with the option C.
  but the main question is, how to put into one field a result of many
 fields: SELECT * FROM.

 thanks,
  Rich

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Monday, January 24, 2011 02:07
 To: solr-user@lucene.apache.org
 Subject: Re: DIH serialize

 Depends on your process chain to the eventual viewer/consumer of the data.

 The questions to ask are:
  A/ Is the data IN Solr going to be viewed or processed in its original
 form:
  --set stored = 'true'
 ---no serialization needed.
  B/ If it's going to be anayzed and searched for separate from any other
 field,

  the analyzing will put it into  an unreadable form. If you need to see
 it,
 then
 ---set indexed=true and stored=true
 ---no serializaton needed.   C/ If it's NOT going to be viewed AS IS,
 and
 it's not going to be searched for AS IS,
   (i.e. other columns will be how the data is found), and you have
 another,

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  C/ If it's NOT going to be viewed AS IS, and it's not going to be searched
 for
 AS IS,
   (i.e. other columns will be how the data is found), and you have
 another,

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for
 AS
 IS,
   (this column will be how the data is found), and you have another,
   serialzable format:
   --you need to put it into TWO columns
   --A SERIALIZED FIELD
   --set indexed=false and stored=true

  --AN UNSERIALIZED FIELD
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.

 Hope that helps!


 Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Papp Richard ccode...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sun, January 23, 2011 2:02:05 PM
 Subject: DIH serialize

 Hi all,



  I wasted the last few hours trying to serialize some column values (from
 mysql) into a Solr column, but I just can't find such a function. I'll use
 the value in PHP - I don't know if it is possible to serialize in PHP style
 at all. This is what I tried and works with a given factor:



 in schema.xml:

   field name=main_timetable  type=text indexed=false
 stored=true multiValued=true /



 in DIH xml:



 dataConfig

  script![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]/script



 .



  entity name=main_timetable query=

SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id
 }';

transformer=script:my_serialize



 .

 



  Can I use java directly in script (script language=Java) ?

  How could I achieve this? Or any other idea?

  I need these values together (from a row) and I need then in PHP to handle
 the result easily.



 thanks,

  Rich


 __ Information from ESET NOD32 Antivirus, version of virus
 signature database 5740 (20101228) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 __ Information from ESET NOD32 Antivirus, version of virus
 signature database 5740 (20101228) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com





Re: searching based on grouping result

2011-01-24 Thread Stefan Matheis
Steve,

and what exactly do you expect? You can work on the Group itself with
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters in a limited
way, but of course it's just a normal Solr-Result, group by some Values,
nothing really special.

Can't be referenced in the query - what do you want to do there?

Regards
Stefan

On Mon, Jan 24, 2011 at 4:27 PM, Steve Fuchs st...@aps.org wrote:

 Hi Thanks for the response.

 I didn't explain myself well, I am using the field collapsing and things
 are working as that page describes.

 I think my problem is that as well as field collapsing works, solr is still
 just returning a list of documents. There don't seem to be any operations I
 can do on collapsed groups as a whole. They are more of a display thing that
 can't be referenced in the query. Same thing with facets? Am I right in
 this?

 steve



 thansk again
 steve

 On Jan 22, 2011, at 12:53 AM, Otis Gospodnetic wrote:

  Steve,
 
  Does http://wiki.apache.org/solr/FieldCollapsing do what you need?
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original Message 
  From: Steve Fuchs st...@aps.org
  To: solr-user@lucene.apache.org
  Sent: Fri, January 21, 2011 3:05:32 PM
  Subject: searching based on grouping result
 
  Hello All,
 
  MY index documents represent a set of papers each with an  author id and
 the id
  of the referee that reviewed the paper.
 
  I also end  up with a field in each document that tells me whether the
 referee
  still has the  paper, but has not graded it. THis can be a boolean.
 
  In my final result I  want to collapse the result by referee number and
 omit
  any referee that has this  boolean true, it doesn't matter how many
 documents
  they have with the field set  to false.
 
  Is there a way to set my query to honor the results of the  grouping (or
 of a
  facet?) as in q:  -referee_number.open_flag:*
 
  ?
 
 
  Thanks in advance.
 
  steve
 




RE: help integrating katta with solr

2011-01-24 Thread Jerry Mindek
Hi Otis,

I was implementing Katta because I discovered it before Solr Cloud.
Before replying to your email, I took some time to go through the examples
on the solr cloud wiki.

The examples worked without any issue for me and I now have a better
understanding of what solr cloud is offering. 

My experience with it so far is good.

It seems to me that Solr Cloud and Katta both offer failover using
zookeeper, load balancing, and easier shard deployment and shard searching. 

These are all important issues for my company and I as we have many sharded
indexes. We are always looking for ways to simplify and shorten the time it
takes to index, deploy, maintain, and trouble shoot those sharded
collections.

A major difference I see between the between the two is that Katta relies on
Hadoop HDFS for storage whereas solr cloud has no such dependence.

I still would like to integrate Katta into Solr. If for no other reason than
to complete a task that I set out to do. Also, it would be nice to explore
its differences from solr cloud, giving us a choice in which solution to
implement.

So, I am still looking for some assistance integrating Katta with Solr. :-)


Thanks,
Jerry


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Saturday, January 22, 2011 12:52 AM
To: solr-user@lucene.apache.org
Subject: Re: help integrating katta with solr

Hi Jerry,

Sorry, not a direct answer, but why Katta?  Why nor SolrCloud (i.e. trunk) 
instead?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Jerry Mindek jerry.min...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, January 21, 2011 4:37:12 PM
 Subject: help integrating katta with solr
 
 Hello,
 
 
 
 I have been trying to integrate Katta with Solr sadly,  without success.
 
 I am using the information from JIRA  issue 1395 as  a guide. However,
this
 information seems out of date and incomplete.
 
 
 
 So far, I have attempted to integrate Katta with both solr trunk  and
 branch-1.4.
 
 I am unable to get the patches applied completely and  am totally unable
to
 compile solr once the patches are applied.
 
 
 
 Could someone provide some tips or, an up to date guide on how to do
this?
 
 
 
 Thanks,
 
 Jerry Mindek
 
 
 
 



Re: searching based on grouping result

2011-01-24 Thread Steve Fuchs
Thanks

What I'd really like to do is to exclude an entire group if a certain field is 
set to true in any of the documents that make up that group. I can't do it at 
index time because some of my users have certain documents hidden from them. So 
they shouldn't see the flag as set, while others would.

I can do it in post processing, but that will mess up sorting and pagination.


Thanks again
steve

On Jan 24, 2011, at 11:39 AM, Stefan Matheis wrote:

 Steve,
 
 and what exactly do you expect? You can work on the Group itself with
 http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters in a limited
 way, but of course it's just a normal Solr-Result, group by some Values,
 nothing really special.
 
 Can't be referenced in the query - what do you want to do there?
 
 Regards
 Stefan
 
 On Mon, Jan 24, 2011 at 4:27 PM, Steve Fuchs st...@aps.org wrote:
 
 Hi Thanks for the response.
 
 I didn't explain myself well, I am using the field collapsing and things
 are working as that page describes.
 
 I think my problem is that as well as field collapsing works, solr is still
 just returning a list of documents. There don't seem to be any operations I
 can do on collapsed groups as a whole. They are more of a display thing that
 can't be referenced in the query. Same thing with facets? Am I right in
 this?
 
 steve
 
 
 
 thansk again
 steve
 
 On Jan 22, 2011, at 12:53 AM, Otis Gospodnetic wrote:
 
 Steve,
 
 Does http://wiki.apache.org/solr/FieldCollapsing do what you need?
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
 From: Steve Fuchs st...@aps.org
 To: solr-user@lucene.apache.org
 Sent: Fri, January 21, 2011 3:05:32 PM
 Subject: searching based on grouping result
 
 Hello All,
 
 MY index documents represent a set of papers each with an  author id and
 the id
 of the referee that reviewed the paper.
 
 I also end  up with a field in each document that tells me whether the
 referee
 still has the  paper, but has not graded it. THis can be a boolean.
 
 In my final result I  want to collapse the result by referee number and
 omit
 any referee that has this  boolean true, it doesn't matter how many
 documents
 they have with the field set  to false.
 
 Is there a way to set my query to honor the results of the  grouping (or
 of a
 facet?) as in q:  -referee_number.open_flag:*
 
 ?
 
 
 Thanks in advance.
 
 steve
 
 
 



Weird behaviour with phrase queries

2011-01-24 Thread Jerome Renard
Hi,

I have a problem with phrase queries, from times to times I do not get any
result
where as I know I should get returned something.

The search is run against a field of type text which definition is
available at the following URL :
- http://pastebin.com/Ncem7M8z

This field is defined with the following configuration:
field name=meta_text type=textindexed=true  stored=true
multiValued=true termVectors=true/

I use the following request handler:
requestHandler name=custom class=solr.DisMaxRequestHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfmeta_text/str
str name=pfmeta_text/str
str name=bf/
str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str
int name=ps100/int
str name=q.alt*:*/str
/lst
/requestHandler

Depending on the kind of phrase query I use I get either exactly what I am
looking for or nothing.

Index' contents is all french so I thought about a possible problem with
accents but I got queries working
with phrase queries containing é and è chars like académie or
ingénieur.

As you will see the filter used in the text type uses the
SnowballPorterFilterFactory for the english language,
I plan to fix that by using the correct language for the index (French) and
the following protwords http://bit.ly/i8JeX6 .

But except this mistake with the stemmer, did I do something (else) wrong ?
Did I overlook something ? What could
explain I do not always get results for my phrase queries ?

Thanks in advance for your feedback.

Best Regards,

--
Jérôme


Re: Multicore Relaod Theoretical Question

2011-01-24 Thread Em

Thanks Alexander, what a valuable ressource :).

- Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2321335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy in SOLR

2011-01-24 Thread Em

Thank you for the advice, Erick!

I will take a look at extending the StandardRequestHandler for such
usecases.


Erick Erickson wrote:
 
 I wasn't thinking about this for adding information to the *request*.
 Rather, in this
 case the autocomplete uses an Ajax call that just uses the TermsComponent
 to get the autocomplete data and display it. This is just textual, so
 adding
 it to the
 request is client-side magic.
 
 If you want your app to have access to the meta-data for other purposes,
 you'd
 just query and cache it from the app. You could use that to build up the
 links
 you embed in the page for new queries if you chose, no custom handlers
 necessary.
 
 Otherwise, I guess you'd create a custom request handler, that seems like
 a
 reasonable place.
 
 Best
 Erick
 
 On Mon, Jan 24, 2011 at 11:03 AM, Em mailformailingli...@yahoo.de wrote:
 

 Hi Erick,

 in some usecases I really think that your suggestion with some
 unique-documents for meta-information is a good approach to solve some
 issues.
 However there is a hurdle for me and maybe you can help me to clear it:

 What is the best way to get such meta-data?
 I see three possible approaches:
 1st: get it in another request
 2nd: get it with a requestHandler
 3rd: get it with a searchComponent

 I think the 2nd and 3rd are the cleanest ways.
 But to make a decision between them I run into two problems:
 RequestHandler: Should I extend the StandardRequestHandler to do what I
 need? If so, I could just query my index for the needed information and
 add
 it to the request before I pass it up the SearchComponents.

 SearchComponent: The problem with the SearchComponent is the distributed
 thing and how to test it. However, if this would be the cleanest way to
 go,
 one should go it.

 What would you do, if you want to add some meta-information to your
 request
 that was not given by the user?

 Regards,
 Em


 Erick Erickson wrote:
 
  First, the redundancy is certainly there, but that's what Solr does,
  handles
  large
  amounts of data. 4 million documents is actually a pretty small corpus
 by
  Solr
  standards, so you may well be able to do exactly what you propose with
  acceptable performance/size. I'd advise just trying it with, say,
 200,000
  docs.
  Why 200K? because index growth is non-linear with the first bunch of
  documents
  taking up more space than the second. So index 100K, examine your
 indexes
  and index 100K more. Now use the delta to extrapolate to 4M.
 
  You don't need to store the taxonomy in each doc for auto-complete, you
  can
  get your auto-completion from a different index. Or you can index your
  taxonomies
  in a special document in Solr and query the (unique) field in that
  document for
  autocomplete.
 
  For faceting, you do need taxonomies. But remember that the nature of
 the
  inverted index is that unique terms are only stored once, and the
 document
  ID for each document that that term appears in is recorded. So if you
 have
  3/europe/germany/berlin stored in 1M documents, your index space is
 really
  string length + overhead + space for 1M ids.
 
  Best
  Erick
 
  On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
  dfonta...@rosebud.frwrote:
 
  Yes, i am not obliged to store taxonomies.
 
  My taxonomies are type of
 
  english_taxon_label = Berlin
  english_taxon_type = location
  english_taxon_hierarchy = 0/world
   1/world/europe
   2/world/europe/germany
 
  3/world/europe/germany/berlin
 
  I need *_taxon_hierarchy to faceting and label to auto complete.
 
  With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
  million documents the redundandcy is huge, no ?
 
  And i have 10 different taxonomies per document 
 
  Damien
 
  Le 24/01/2011 10:30, Em a écrit :
 
   Hi Damien,
 
  why are you storing the taxonomies?
  When it comes to faceting, it only depends on indexed values. If
 there
  is
  a
  meaningful difference between the indexed and the stored value, I
 would
  prefer to use an RDBMs or something like that to reduce redundancy.
 
  Does this help?
 
  Regards
 
 
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2321340.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Weird behaviour with phrase queries

2011-01-24 Thread Em

Hi Jerome,

does your fieldtype contains a stopword-filter?
Probably this could be the root of all evil :-).

Could you provide us the fieldtype definition and the explain-content of an
example-query?
Did you check the analysis.jsp to have a look at the produced results?

Regards,
Em


Jerome Renard wrote:
 
 Hi,
 
 I have a problem with phrase queries, from times to times I do not get any
 result
 where as I know I should get returned something.
 
 The search is run against a field of type text which definition is
 available at the following URL :
 - http://pastebin.com/Ncem7M8z
 
 This field is defined with the following configuration:
 field name=meta_text type=textindexed=true  stored=true
 multiValued=true termVectors=true/
 
 I use the following request handler:
 requestHandler name=custom class=solr.DisMaxRequestHandler
 lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfmeta_text/str
 str name=pfmeta_text/str
 str name=bf/
 str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str
 int name=ps100/int
 str name=q.alt*:*/str
 /lst
 /requestHandler
 
 Depending on the kind of phrase query I use I get either exactly what I am
 looking for or nothing.
 
 Index' contents is all french so I thought about a possible problem with
 accents but I got queries working
 with phrase queries containing é and è chars like académie or
 ingénieur.
 
 As you will see the filter used in the text type uses the
 SnowballPorterFilterFactory for the english language,
 I plan to fix that by using the correct language for the index (French)
 and
 the following protwords http://bit.ly/i8JeX6 .
 
 But except this mistake with the stemmer, did I do something (else) wrong
 ?
 Did I overlook something ? What could
 explain I do not always get results for my phrase queries ?
 
 Thanks in advance for your feedback.
 
 Best Regards,
 
 --
 Jérôme
 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behaviour-with-phrase-queries-tp2321241p2321362.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Taxonomy in SOLR

2011-01-24 Thread Jonathan Rochkind
There aren't any great general purpose out of the box ways to handle 
hieararchical data in Solr.  Solr isn't an rdbms.


There may be some particular advice on how to set up a particular Solr 
index to answer particular questions with regard to hieararchical data.


I saw a great point made recently comparing rdbms to NoSQL stores, which 
applied to Solr too even though Solr is NOT a noSQL store.  In rdbms, 
you set up your schema thinking only about your _data_, and modelling 
your data as flexibly as possible. Then once you've done that, you can 
ask pretty much any well-specified question you want of your data, and 
get a correct and reasonably performant answer.


In Solr, on the other hand, we set up our schemas to answer particular 
questions. You have to first figure out what kinds of questions you will 
want to ask Solr, what kinds of queries you'll want to make, and then 
you can figure out how to structure your data to ask those questions.  
Some questions are actually very hard to set up Solr to answer -- in 
general Solr is about setting up your data so whatever question you have 
can be reduced to asking is token X in field Y.


This can be especially tricky in cases where you want to use a single 
Solr index to answer multiple questions, where the questions are such 
that you really need to set up your data _differently_ to get Solr to 
optimally answer each question.


Solr is not a general purpose store like an rdbms, where you can set up 
your schema once in terms of your data and use it to answer nearly any 
conceivable well-specified question after that.  Instead, Solr does 
things that rdbms can't do quickly or can't do at all.  But you lose 
some things too.


On 1/24/2011 3:03 AM, Damien Fontaine wrote:

Hi,

I am trying Solr and i have one question. In the schema that i set up,
there are 10 fields with always same data(hierarchical taxonomies) but
with 4 million
documents, space disk and indexing time must be big. I need this field
for auto complete. Is there another way to do this type of operation ?

Damien



Re: Weird behaviour with phrase queries

2011-01-24 Thread Erick Erickson
Try submitting your query from the admin page with debugQuery=on and see
if that helps. The output is pretty dense, so feel free to cut-paste the
results for
help.

Your stemmers have English as the language, which could also be
interesting.

As Em says, the analysis page may help here, but I'd start by taking out
WordDelimiterFilterFactory, SnowballPorterFilterFactory and
StopFilterFactory
and build back up if you really need them. Although, again, the analysis
page
that's accessible from the admin page may help greatly (check debug in
both
index and query).

Oh, and you MUST re-index after changing your schema to have a true test.

Best
Erick

On Mon, Jan 24, 2011 at 12:31 PM, Jerome Renard jerome.ren...@gmail.comwrote:

 Hi,

 I have a problem with phrase queries, from times to times I do not get any
 result
 where as I know I should get returned something.

 The search is run against a field of type text which definition is
 available at the following URL :
 - http://pastebin.com/Ncem7M8z

 This field is defined with the following configuration:
 field name=meta_text type=textindexed=true  stored=true
 multiValued=true termVectors=true/

 I use the following request handler:
 requestHandler name=custom class=solr.DisMaxRequestHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfmeta_text/str
str name=pfmeta_text/str
str name=bf/
str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str
int name=ps100/int
str name=q.alt*:*/str
/lst
 /requestHandler

 Depending on the kind of phrase query I use I get either exactly what I am
 looking for or nothing.

 Index' contents is all french so I thought about a possible problem with
 accents but I got queries working
 with phrase queries containing é and è chars like académie or
 ingénieur.

 As you will see the filter used in the text type uses the
 SnowballPorterFilterFactory for the english language,
 I plan to fix that by using the correct language for the index (French) and
 the following protwords http://bit.ly/i8JeX6 .

 But except this mistake with the stemmer, did I do something (else) wrong ?
 Did I overlook something ? What could
 explain I do not always get results for my phrase queries ?

 Thanks in advance for your feedback.

 Best Regards,

 --
 Jérôme



MySQL + DIH + SpatialSearch

2011-01-24 Thread Eric Angel
I had difficulties getting this to work, so hopefully this will help others
having the same issue.

My environment:

Solr 3.1
MySQL 5.0.77

Schema:
 fieldType name=location class=solr.LatLonType
subFieldSuffix=_coordinate/
field name=latlng type=location indexed=true stored=true/
dynamicField name=*_coordinate  type=tdouble indexed=true
stored=false/

DIH data-config:
dataSource driver=java.lang.String
url=jdbc:mysql://xxx.xxx.xxx.xxx/db1 user=user password=secret
readOnly=true batchSize=-1/
entity name=practice pk=id query=select id, name, concat_ws(',',
lat, lng) as latlng from practice /entity

I kept getting build errors similar to this:

org.apache.solr.common.SolrException:
org.apache.lucene.spatial.tier.InvalidGeoException: incompatible dimension
(2) and values ([B@2964a05d).  Only 0 values specified
at org.apache.solr.schema.PointType.createFields(PointType.java:77)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:199)
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProc
essorFactory.java:60)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHand
ler.java:291)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
625)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:265
)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:184)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:335)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393
)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374)
Caused by: org.apache.lucene.spatial.tier.InvalidGeoException: incompatible
dimension (2) and values ([B@2964a05d).  Only 0 values specified
at 
org.apache.lucene.spatial.DistanceUtils.parsePoint(DistanceUtils.java:376)
at org.apache.solr.schema.PointType.createFields(PointType.java:75)

This would happen regardless of whether I used PointType, LatLonType, or
GeoHashField.

So I thought maybe I should pay attention to what the error says -
incompatible dimension (2) and values ([B@2964a05d).  Only 0 values
specified.  Looking at the code, this revealed that it's trying to parse
B@2964a05d into a spatial field.  So my DIH was getting bad values ­
apparently, there's a bug in MySQL 5.0:
http://bugs.mysql.com/bug.php?id=12030 ­ where concat changes the character
set to binary.

To solve this, you can either:
* upgrade to MySQL 5.5 (according to the bug page, it was fixed in 5.5, but
I haven't tested it).
* Or you can typecast before you concat:
 * entity name=practice pk=id query=select id, name, concat_ws(',',
 cast(lat as char), cast(lng as char)) as latlng from practice /entity

Eric




Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow
We have two slaves replicating off one master every 2 minutes.

Both using the CMS + ParNew Garbage collector. Specifically

-server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing

but periodically they both get into a GC storm and just keel over.

Looking through the GC logs the amount of memory reclaimed in each GC 
run gets less and less until we get a concurrent mode failure and then 
Solr effectively dies.

Is it possible there's a memory leak? I note that later versions of 
Lucene have fixed a few leaks. Our current versions are relatively old

Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 
18:06:42

Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55

so I'm wondering if upgrading to later version of Lucene might help (of 
course it might not but I'm trying to investigate all options at this 
point). If so what's the best way to go about this? Can I just grab the 
Lucene jars and drop them somewhere (or unpack and then repack the solr 
war file?). Or should I use a nightly solr 1.4?

Or am I barking up completely the wrong tree? I'm trawling through heap 
logs and gc logs at the moment trying to to see what other tuning I can 
do but any other hints, tips, tricks or cluebats gratefully received. 
Even if it's just Yeah, we had that problem and we added more slaves 
and periodically restarted them

thanks,

Simon


Re: Faceting Question

2011-01-24 Thread beaviebugeater

Hmm, thanks for the response.  I'll play around with it and see if that
helps. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-Question-tp2320542p2321887.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH serialize

2011-01-24 Thread Papp Richard
Hi Stefan,

  yes, this is exactly what I intend - I don't want to search in this field
- just quicly return me the result in a serialized form (the search criteria
is on other fields). Well, if I could serialize the data exactly as like the
PHP serialize() does I would be maximally satisfied, but any other form in
which I could compact the data easily into one field I would be pleased.
  Can anyone help me? I guess the script is quite a good way, but I don't
know which function should I use there to compact the data to be easily
usable in PHP. Or any other method?

thanks,
  Rich

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] 
Sent: Monday, January 24, 2011 18:23
To: solr-user@lucene.apache.org
Subject: Re: DIH serialize

Hi Rich,

i'm a bit confused after reading your post .. what exactly you wanna try to
achieve? Serializing (like http://php.net/serialize) your complete row into
one field? Don't wanna search in them, just store and deliver them in your
results? Does that make sense? Sounds a bit strange :)

Regards
Stefan

On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard ccode...@gmail.com wrote:

 Hi Dennis,

  thank you for your answer, but didn't understand why you say it doesn't
 need serialization. I'm with the option C.
  but the main question is, how to put into one field a result of many
 fields: SELECT * FROM.

 thanks,
  Rich

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Monday, January 24, 2011 02:07
 To: solr-user@lucene.apache.org
 Subject: Re: DIH serialize

 Depends on your process chain to the eventual viewer/consumer of the data.

 The questions to ask are:
  A/ Is the data IN Solr going to be viewed or processed in its original
 form:
  --set stored = 'true'
 ---no serialization needed.
  B/ If it's going to be anayzed and searched for separate from any other
 field,

  the analyzing will put it into  an unreadable form. If you need to
see
 it,
 then
 ---set indexed=true and stored=true
 ---no serializaton needed.   C/ If it's NOT going to be viewed AS IS,
 and
 it's not going to be searched for AS IS,
   (i.e. other columns will be how the data is found), and you have
 another,

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  C/ If it's NOT going to be viewed AS IS, and it's not going to be
searched
 for
 AS IS,
   (i.e. other columns will be how the data is found), and you have
 another,

   serialzable format:
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched
for
 AS
 IS,
   (this column will be how the data is found), and you have another,
   serialzable format:
   --you need to put it into TWO columns
   --A SERIALIZED FIELD
   --set indexed=false and stored=true

  --AN UNSERIALIZED FIELD
   --set indexed=false and stored=true
   --serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.

 Hope that helps!


 Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others' mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Papp Richard ccode...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sun, January 23, 2011 2:02:05 PM
 Subject: DIH serialize

 Hi all,



  I wasted the last few hours trying to serialize some column values (from
 mysql) into a Solr column, but I just can't find such a function. I'll use
 the value in PHP - I don't know if it is possible to serialize in PHP
style
 at all. This is what I tried and works with a given factor:



 in schema.xml:

   field name=main_timetable  type=text indexed=false
 stored=true multiValued=true /



 in DIH xml:



 dataConfig

  script![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]/script



 .



  entity name=main_timetable query=

SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id
 }';

transformer=script:my_serialize



 .

 



  Can I use java directly in script (script language=Java) ?

  How could I achieve this? Or any other idea?

  I need these values together (from a row) and I need then in PHP to
handle
 the result easily.



 thanks,

  Rich


 __ Information from ESET NOD32 Antivirus, version of virus
 signature database 5740 (20101228) __

 The message was checked by ESET NOD32 Antivirus.

 http://www.eset.com



 __ Information 

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Em

Hi Simon,

I got no experiences with a distributed environment.
However, what you are talking about reminds me on another post on the
mailing list.

Could it be possible that your slaves not finished their replicating until
the new replication-process starts?
If so, there you got the OOM :).

Just a thought, perhaps it helps.

Regards,
Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Possible-Memory-Leaks-Upgrading-to-a-Later-Version-of-Solr-or-Lucene-tp2321777p2321959.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH serialize

2011-01-24 Thread greggallen
UNSUBSCRIBE

On 1/23/11, Papp Richard ccode...@gmail.com wrote:
 Hi all,



   I wasted the last few hours trying to serialize some column values (from
 mysql) into a Solr column, but I just can't find such a function. I'll use
 the value in PHP - I don't know if it is possible to serialize in PHP style
 at all. This is what I tried and works with a given factor:



 in schema.xml:

field name=main_timetable  type=text indexed=false
 stored=true multiValued=true /



 in DIH xml:



 dataConfig

   script![CDATA[

 function my_serialize(row)

 {

   row.put('main_timetable', row.toString());

   return row;

 }

   ]]/script



 .



   entity name=main_timetable query=

 SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id}';

 transformer=script:my_serialize

 

 .





   Can I use java directly in script (script language=Java) ?

   How could I achieve this? Or any other idea?

   I need these values together (from a row) and I need then in PHP to handle
 the result easily.



 thanks,

   Rich




Re: Getting started with writing parser

2011-01-24 Thread Gora Mohanty
On Mon, Jan 24, 2011 at 2:28 PM, Dinesh mdineshkuma...@karunya.edu.in wrote:

 my solrconfig.xml

 http://pastebin.com/XDg0L4di

 my schema.xml

 http://pastebin.com/3Vqvr3C0

 my try.xml

 http://pastebin.com/YWsB37ZW
[...]

OK, thanks for the above.

You also need to:
* Give us a sample of your log files (for crying out loud,
  this has got to be the fifth time that I have asked you
  for this).
* Tell us what happens when you run with the above
   configuration. From a cursory look at try.xml, you
   have not really understood how it works, or how to
   configure it for your needs.

Regards,
Gora


Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow
On Mon, Jan 24, 2011 at 08:00:53PM +0100, Markus Jelsma said:
 Are you using 3rd-party plugins?

No third party plugins - this is actually pretty much stock tomcat6 + 
solr from Ubuntu. The only difference is that we've adapted the 
directory layout to fit in with our house style


Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow
On Mon, Jan 24, 2011 at 10:55:59AM -0800, Em said:
 Could it be possible that your slaves not finished their replicating until
 the new replication-process starts?
 If so, there you got the OOM :).

This was one of my thoughts as well - we're currently running a slave 
which has no queries in it just to see if that exhibits similar 
behaviour.

My reasoning against it is that we're not seeing any 

PERFORMANCE WARNING: Overlapping onDeckSearchers=x

in the logs which is something I'd expect to see.

2 minutes doesn't seem like an unreasonable period of time either - the 
docs at http://wiki.apache.org/solr/SolrReplication suggest 20 seconds.




Re: Highlighting with/without Term Vectors

2011-01-24 Thread Salman Akram
Just to add one thing, in case it makes a difference.

Max document size on which highlighting needs to be done is few hundred kb's
(in file system). In index its compressed so should be much smaller. Total
documents are more than 100 million.

On Tue, Jan 25, 2011 at 12:42 AM, Salman Akram 
salman.ak...@northbaysolutions.net wrote:

 Hi,

 Does anyone have any benchmarks how much highlighting speeds up with Term
 Vectors (compared to without it)? e.g. if highlighting on 20 documents take
 1 sec with Term Vectors any idea how long it will take without them?

 I need to know since the index used for highlighting has a TVF file of
 around 450GB (approx 65% of total index size) so I am trying to see whether
 the decreasing the index size by dropping TVF would be more helpful for
 performance (less RAM, should be good for I/O too I guess) or keeping it is
 still better?

 I know the best way is try it out but indexing takes a very long time so
 trying to see whether its even worthy or not.

 --
 Regards,

 Salman Akram




-- 
Regards,

Salman Akram


Re: please help Problem with dataImportHandler

2011-01-24 Thread Chris Hostetter
: this is the error that i'm getting.. no idea of what is it..

Did you follow the instructions in the error message and look at your solr 
log file to see what the severe errors in solr configuration might be?

: SimplePostTool: FATAL: Solr returned an error: 
: 
Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong
...

-Hoss


Re: No system property or default value specified for...

2011-01-24 Thread Chris Hostetter

: I'm trying to dynamically add a core to a multi core system using the
: following command:
: 
: 
http://localhost:8983/solr/admin/cores?action=CREATEname=itemsinstanceDir=itemsconfig=data-config.xmlschema=schema.xmldataDir=datapersist=true
: 
: the data-config.xml looks like this:
: 
: dataConfig

I think you are using the config param incorrectly -- it should be the 
solrconfig.xml file you want to use (assuming you don't want the one found 
in the conf directory of your instanceDir)

that's the reason you are getting errors about needing to specify system 
props or default values for all those variables, because if that file was 
a solrconfig.xml file they must be specified before the SolrCore can be 
initialized -- but for a DIH data configs that's not neccessary.


-Hoss


Re: searching based on grouping result

2011-01-24 Thread Chris Hostetter

: Subject: searching based on grouping result
: In-Reply-To: 913367.31366...@web121705.mail.ne1.yahoo.com
: References: 913367.31366...@web121705.mail.ne1.yahoo.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: Weird behaviour with phrase queries

2011-01-24 Thread Erick Erickson
Hmmm, I don't see any screen shots. Several things:
1 If your stopword file has comments, I'm not sure what the effect would
be.
2 Something's not right here, or I'm being fooled again. Your withresults
xml has this line:
str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d
ingenieur)~0.01) ()/str
and your noresults has this line:
str name=parsedquery+DisjunctionMaxQuery((meta_text:academi
charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi
charpenti~100)~0.01)/str

the empty () in the first one often means you're NOT going to your
configured dismax parser in solrconfig.xml. Yet that doesn't square with
your custom qt, so I'm puzzled.

Could we see your raw query string on the way in? It's almost as if you
defined qt in one and defType in the other, which are not equivalent.
3 It may take 12 hours to index, but you could experiment with a smaller
subset. You say you know that the noresults one should return documents,
what proof do
you have? If there's a single document that you know should match this, just
index it and a few others and you should be able to make many runs until you
get
to the bottom of this...

And obviously your stemming is happening on the query, are you sure it's
happening at index time too?

Best
Erick

On Mon, Jan 24, 2011 at 1:51 PM, Jerome Renard jerome.ren...@gmail.comwrote:

 Hi Em, Erick

 thanks for your feedback.

 Em : yes Here is the stopwords.txt I use :
 -
 http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt

 On Mon, Jan 24, 2011 at 6:58 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Try submitting your query from the admin page with debugQuery=on and see
 if that helps. The output is pretty dense, so feel free to cut-paste the
 results for
 help.

 Your stemmers have English as the language, which could also be
 interesting.


 Yes, I noticed that this will be fixed.


 As Em says, the analysis page may help here, but I'd start by taking out
 WordDelimiterFilterFactory, SnowballPorterFilterFactory and
 StopFilterFactory
 and build back up if you really need them. Although, again, the analysis
 page
 that's accessible from the admin page may help greatly (check debug in
 both
 index and query).


 You will find attached two xml files one with no results (noresult.xml.gz)
 and one with
 a lot of results (withresults.xml.gz). You will also find attached two
 screenshots showing
 there is a highlighted section in the Index analyzer section when
 analysing text.


 Oh, and you MUST re-index after changing your schema to have a true test.


 Yes, the problem is that reindexing takes around 12 hours which makes it
 really hard
 for testing :/


 Thanks in advance for your feedback.

 Best Regards,

 --
 Jérôme



Re: Solr with Unknown Lucene Index?

2011-01-24 Thread Chris Hostetter

: Having found some code that searches a Lucene index, the only analyzers
: referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.
: 
: How can I map this is Solr? The example schema doesn't seem to mention this,
: and specifying 'text' or 'string' for every field doesn't seem to help.

1) that analyzer seems to be a Lucene.Net analyzer, so the java equivilent 
would be org.apache.lucene.analsys.standard.StandardAnalyzer

2) the example schema.xml demonstrates how to use an existing Analyzer 
implementation...

!-- One can also specify an existing Analyzer class that has a
 default constructor via the class attribute on the analyzer element
fieldType name=text_greek class=solr.TextField
  analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/
/fieldType
--

3) i'm getting the sense from your comments that you aren't very familiar 
with lucene/solr in general.  An important thing to understand is that 
just because the code that created the index only ever uses 
StandardAnalyzer doens't mean it will make sense to use that analyzer on 
every field when attempting to search that field from solr -- some fields 
may have been indexed w/o using any analysis, some may be numeric fields 
with special encoding, some may be compressed, etc...

trying to reverse engineer what the schema should look like to open any 
arbitrary index requires a lot of understanding about how that index was 
built -- it's easy to just dump the terms found in an index w/o knowing 
anything about where those terms came fom (that's what Luke does) but that 
doens't help your recognize things like this list of X words were treated 
as stop words, and don't appera in the index, so my query analyzer needs 
to be configured with those same X words

In short: you can eaisly make solr *read* the index (just like luke) but 
that won't neccessarily help you *use* the index in a meaninigful way.

-Hoss


Re: Specifying an AnalyzerFactory in the schema

2011-01-24 Thread Chris Hostetter

: I notice that in the schema, it is only possible to specify a Analyzer class,
: but not a Factory class as for the other elements (Tokenizer, Fitler, etc.).
: This limits the use of this feature, as it is impossible to specify parameters
: for the Analyzer.
: I have looked at the IndexSchema implementation, and I think this requires a
: simple fix. Do I open an issue about it ?

Support for constructing Analyzers directly is very crude, and primarily 
existed for making it easy for people with old indexes and analyzers to 
keep working.

moving foward, Lucene/Solr eventtually won't ship concret Analyzers 
implementations at all (at least, that's the last concensus i remember) so 
enhancing support for loading Analyzers (or AnalyzerFactories) doesn't 
make much sense.

Practically speaking, if you have an existing Analyzer that you want to 
use in Solr, instead of writting an AnalyzerFactory for it, you could 
just write a TokenizerFactory that wraps it instead -- functinally that 
would let you achieve everything ana AnalyzerFactory would, except that 
Solr would already handle letting the schema.xml specify the 
positionIncrementGap (which you could happily ignore if you wanted)


-Hoss


Solr set up issues with Magento

2011-01-24 Thread solrEvaluation

Hello Team:


  I am in the process of setting up Solr 1.4 with Magento ENterprise Edition
1.9. 

When I try to index the products I get the following error message.

Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
fini
sh
INFO: {} 0 0
Jan 24, 2011 3:30:14 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'in_stock'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav
a:289)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
ateProcessorFactory.java:60)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:244)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
alve.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
alve.java:161)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
ava:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
ava:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
550)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
ve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
a:380)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
:243)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
ss(Http11Protocol.java:188)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
ss(Http11Protocol.java:166)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin
t.java:288)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)

Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={wt=json} status=400 QTime=0
Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
fini
sh
INFO: {rollback=} 0 16
Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute

I am a new to both Magento and SOlr. I could have done some thing stupid
during installation. I really look forward for your help.

Thank you,
Sandhya
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-set-up-issues-with-Magento-tp2323858p2323858.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr set up issues with Magento

2011-01-24 Thread Markus Jelsma
Hi,

You haven't defined the field in Solr's schema.xml configuration so it needs to 
be added first. Perhaps following the tutorial might be a good idea.

http://lucene.apache.org/solr/tutorial.html

Cheers.

 Hello Team:
 
 
   I am in the process of setting up Solr 1.4 with Magento ENterprise
 Edition 1.9.
 
 When I try to index the products I get the following error message.
 
 Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
 fini
 sh
 INFO: {} 0 0
 Jan 24, 2011 3:30:14 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field
 'in_stock' at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav
 a:289)
 at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
 ateProcessorFactory.java:60)
 at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
 ntentStreamHandlerBase.java:54)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
 erBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
 .java:338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
 r.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
 icationFilterChain.java:244)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
 ilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
 alve.java:240)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
 alve.java:161)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
 ava:164)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
 ava:100)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
 550)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
 ve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
 a:380)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
 
 :243)
 
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
 ss(Http11Protocol.java:188)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
 ss(Http11Protocol.java:166)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin
 t.java:288)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
 utor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
 .java:908)
 at java.lang.Thread.run(Thread.java:662)
 
 Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/update params={wt=json} status=400 QTime=0
 Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2
 rollback INFO: start rollback
 Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2
 rollback INFO: end_rollback
 Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor
 fini
 sh
 INFO: {rollback=} 0 16
 Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute
 
 I am a new to both Magento and SOlr. I could have done some thing stupid
 during installation. I really look forward for your help.
 
 Thank you,
 Sandhya


Re: Stemming for Finnish language

2011-01-24 Thread Chris Hostetter

: I tried following in my schema.xml, but I got
: org.apache.solr.common.SolrException: Error loading class
: 'solr.FinnishLightStemFilterFactory'

FinnishLightStemFilterFactory is a class that exists in SVN on the 3x and 
trunk branches, but does not exist in the Solr 1.4.1 release (it was added 
later)

if you are trying ot use Solr 1.4.1, this won't work, if you are 
getting this error using a 3x or trunk development version, please 
elaborate on how you are installing/running Solr


-Hoss


synonyms file, and example cases

2011-01-24 Thread Cam Bazz
Hello,

I have been looking at the solr synonym file that was an example, I
did not understand some notation:

aaa = 

bbb = 1 2

ccc = 1,2

a\=a = b\=b

a\,a = b\,b

fooaaa,baraaa,bazaaa

The first one says search for  when query is aaa. am I correct?
the second one finds 1 2 when query is bbb
the third one is find 1 or 2 when query is ccc

the fourth, and fifth one I have not understood.

the last one, i assume is a group, bidirectional mapping between
fooaaa,baraaa,bazaaa

I am especially interested with this last one, if I do aaa,bbb it will
find aaa and bbb when either aaa or bbb is queryied?

am I correct in those assumptions?

Best regards,
C.B.


Re: How call I make one request for all cores and get response classified by cores

2011-01-24 Thread Chris Hostetter

: I have a group of subindex, each of which is a core in my solr now. I want
: to make one query for some of them, how can I do that? And classify response
: doc by index, using facet search?

some background:

multi core is when you have multiple solr cores on one solr instance;
each core can have different configs.

distributed search is when you execute a search on a core and specify 
in the query a list of other cores on other solr instances to treat as 
shards and aggregate the results from all of them; each shard must 
have identicle schemas.

That said: you can to a distributed search, across a bunch of shards 
that are all on the same solr instance.  if you index a constant value in 
each one identifying which sub-indx it comes from, you should have what 
you're looking for.

-Hoss


Re: Adding weightage to the facets count

2011-01-24 Thread Chris Hostetter

: prod1 has tag called “Light Weight” with weightage 20,
: prod2 has tag called “Light Weight” with weightage 100,
: 
: If i get facet for “Light Weight” , i will get Light Weight (2) ,
: here i need to consider the weightage in to account, and the result will be
: Light Weight (120) 
: 
: How can we achieve this?Any ideas are really helpful.


It's not really possible with Solr out of the box.  Faceting is fast and 
efficient in Solr because it's all done using set intersections (and most 
of the sets can be kept in ram very compactly and reused).  For what you 
are describing you'd need to no only assocaited a weighted payload with 
every TermPosition, but also factor that weight in when doing the 
faceting, which means efficient set operations are now out the window.

If you know java it would be probably be possible to write a custom 
SolrPlugin (a SearchComponent) to do this type of faceting in special 
cases (assuming you indexed in a particular way) but i'm not sure off hte 
top of my head how well it would scale -- the basic algo i'm thinking of 
is (after indexing each facet term wit ha weight payload) to iterate over 
the DocSet of all matching documents in parallel with an iteration over 
a TermPositions, skipping ahead to only the docs that match the query, and 
recording the sum of the payloads for each term.

Hmmm...

except TermPositions iterates over term, doc, freq, position tuples, 
so you would have to iterate over every term, and for every term then loop 
over all matching docs ... like i said, not sure how efficient it would 
wind up being.

You might be happier all arround if you just do some sampling -- store the 
tag+weight pairs so thta htey cna be retireved with each doc, and then 
when you get your top facet constraints back, look at the first page of 
results, and figure out what the sun weight is for each of those 
constraints based solely on the page#1 results.

i've had happy users using a similar appraoch in the past.

-Hoss

Re: Getting started with writing parser

2011-01-24 Thread Dinesh

http://pastebin.com/CkxrEh6h

this is my sample log

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2326646.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: please help Problem with dataImportHandler

2011-01-24 Thread Dinesh

http://pastebin.com/tjCs5dHm

this is the log produced by the solr server

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2326659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr suggester and spell checker

2011-01-24 Thread madhug

Hi, 
I am using the default example in the latest stable build
(apache-solr-4.0-2011-01-23_11-24-01). 

I read the wiki on http://wiki.apache.org/solr/Suggester and my expectation
is that suggester would correct terms in addition to completing terms.
The handler for suggest is configured with spellcheck as true.

requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  ..
/requestHandler

However, the query http://localhost:8983/solr/suggest?q=belkn%20enc
returns str name='collation'belkn encoded/str (belkn is not corrected to
belkin).

The spellchecker component corrects belkn to belkin though.
http://localhost:8983/solr/spell?q=belkn%20encodedspellcheck=truespellcheck.collate=truespellcheck.build=true
str name='collation'belkin encoded/str

Would really appreciate any input on how suggester can correct as well as
complete terms in the input.

Thanks
Madhu
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-tp2326907p2326907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting started with writing parser

2011-01-24 Thread Dinesh

i don't even know whether the regex expression that i'm using for my log is
correct or no.. i very much worried i couldn't proceed in my project already
1/3 rd of the timing is over.. please help.. this is just the first stage..
after this i have ti setup up all the log to be redirected to SYSLOG and
from there i'll send it to SOLR server.. then i have to analyse all the
data's that i obtained from DNS, DHCP, WIFI, SWITCES.. and i have to prepare
a user based report on his actions.. please help me cause the day's i have
keeps reducing.. my project leader is questioning me a lot.. pls..

-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2326917.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud Questions for MultiCore Setup

2011-01-24 Thread Em

Hi,

just wanted to push this topic again.

Thank you!


Em wrote:
 
 By the way: although I am asking for SolrCloud explicitly again, I will
 take your advice and try distributed search first to understand the
 concept better.
 
 Regards
 
 
 Em wrote:
 
 Hi Lance,
 
 thanks for your explanation.
 
 As far as I know in distributed search i have to tell Solr what other
 shards it has to query. So, if I want to query a specific core, present
 in all my shards, i could tell Solr this by using the shards-param plus
 specified core on each shard.
 
 Using SolrCloud's distrib=true feature (it sets all the known shards
 automatically?), a collection should consist only of one type of
 core-schema, correct?
 How does SolrCloud knows that shard_x and shard_y are replicas of
 eachother (I took a look at the  possibility to specify alternative
 shards if one is not available)? If it does not know that they are
 replicas of eachother, I should use the syntax of specifying alternative
 shards for failover due to performance-reasons, because querying 2
 identical and available cores seems to be wasted capacity, no? 
 
 Thank you!
 
 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2327089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: old index files not deleted on slave

2011-01-24 Thread feedly team
Interestingly that worked. I deleted the slave index and restarted.
After the first replication I shut down the server, deleted the lock
file and started it again. It seems to be behaving itself now even
though a lock file seems to be recreated. Thanks a lot for the help.
This still seems like a bug though?

I don't have any writers open on the slaves, in fact one slave is only
doing replication right now (no reads) to try to isolate the problem.

On Sat, Jan 22, 2011 at 7:34 PM, Alexander Kanarsky
kanarsky2...@gmail.com wrote:
 I see the file

 -rw-rw-r-- 1 feeddo feeddo    0 Dec 15 01:19
 lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock

 was created on Dec. 15. At the end of the replication, as far as I
 remember, the SnapPuller tries to open the writer to ensure the old
 files are deleted, and in
 your case it cannot obtain a lock on the index folder on Dec 16,
 17,18. Can you reproduce the problem if you delete the lock file,
 restart the slave
 and try replication again? Do you have any other Writer(s) open for
 this folder outside of this core?

 -Alexander

 On Sat, Jan 22, 2011 at 3:52 PM, feedly team feedly...@gmail.com wrote:
 The file system checked out, I also tried creating a slave on a
 different machine and could reproduce the issue. I logged SOLR-2329.

 On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog goks...@gmail.com wrote:
 This could be a quirk of the native locking feature. What's the file
 system? Can you fsck it?

 If this error keeps happening, please file this. It should not happen.
 Add the text above and also your solrconfigs if you can.

 One thing you could try is to change from the native locking policy to
 the simple locking policy - but only on the child.

 On Sat, Dec 18, 2010 at 4:44 PM, feedly team feedly...@gmail.com wrote:
 I have set up index replication (triggered on optimize). The problem I
 am having is the old index files are not being deleted on the slave.
 After each replication, I can see the old files still hanging around
 as well as the files that have just been pulled. This causes the data
 directory size to increase by the index size every replication until
 the disk fills up.

 Checking the logs, I see the following error:

 SEVERE: SnapPull failed
 org.apache.solr.common.SolrException: Index fetch failed :
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
        at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out:
 NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1065)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:954)
        at 
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192)
        at 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
        at 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
        at 
 org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
        at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
        ... 11 more

 lsof reveals that the file is still opened from the java process.

 I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup
 is pretty vanilla. The OS is linux, the indexes are on local
 directories, write permissions look ok, nothing unusual in the config
 (default deletion policy, etc.). Contents of the index data dir:

 master:
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
 -rw-rw-r-- 1 feeddo feeddo  26M Dec 14 01:07 _1lg.fdx
 -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt
 -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis
 -rw-rw-r-- 1 feeddo feeddo  15M Dec 14 01:12 

Re: Weird behaviour with phrase queries

2011-01-24 Thread Jerome Renard
Erick,

On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I don't see any screen shots. Several things:
 1 If your stopword file has comments, I'm not sure what the effect would
 be.


Ha, I thought comments were supported in stopwords.txt


 2 Something's not right here, or I'm being fooled again. Your withresults
 xml has this line:
 str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d
 ingenieur)~0.01) ()/str
 and your noresults has this line:
 str name=parsedquery+DisjunctionMaxQuery((meta_text:academi
 charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi
 charpenti~100)~0.01)/str

 the empty () in the first one often means you're NOT going to your
 configured dismax parser in solrconfig.xml. Yet that doesn't square with
 your custom qt, so I'm puzzled.

 Could we see your raw query string on the way in? It's almost as if you
 defined qt in one and defType in the other, which are not equivalent.


You are right I fixed this problem (my bad).

3 It may take 12 hours to index, but you could experiment with a smaller
 subset. You say you know that the noresults one should return documents,
 what proof do
 you have? If there's a single document that you know should match this,
 just
 index it and a few others and you should be able to make many runs until
 you
 get
 to the bottom of this...


I could but I always thought I had to fully re-index after updating
schema.xml. If
I update only few documents will that take the changes into account without
breaking
the rest ?


 And obviously your stemming is happening on the query, are you sure it's
 happening at index time too?


Since you did not get the screenshots you will find attached the full output
of the analysis
for a phrase that works and for another that does not.

Thanks for your support

Best Regards,

--
Jérôme


analysis-noresults.html.gz
Description: GNU Zip compressed data


analysis-withresults.html.gz
Description: GNU Zip compressed data