RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Ephraim Ofir
Search the list for my post DIH - deleting documents, high performance
(delta) imports, and passing parameters which shows my solution a
similar problem.

Ephraim Ofir

-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Tuesday, May 24, 2011 11:24 PM
To: solr-user@lucene.apache.org
Subject: DIH import and postImportDeleteQuery

Guys,

I am facing a situation in one of our projects that I need to perform a
cleanup to remove some documents after we perform an update via DIH.
The big issue right now comes from the fact that when we call the DIH
with
clean=false, the postImportDeleteQuery is not executed.

My setup is currently arranged like this:
- A SQL Server stored procedure that receives a parameter (specified in
the
URL) and returns the records to be indexed
- The procedure is able to return all the records (for a full-import) or
only the updated records (for a delta-import)
- This procedure returns valid and deleted records, from this point
comes
the need to run a postImportDeleteQuery to remove the deleted ones.

Everything works fine when I run a full-import, I am running always with
clean=true, and then the whole index is rebuilt.
When I need to do an incremental update, the records are updated
correctly,
but the command to delete the other records is not executed.

I've tried several combinations, with different results:
- Running full-import with clean=false: the records are updated but the
ones
that needs to be deleted stays on the index
- Running delta-import with clean=false: the records are updated but the
ones that needs to be deleted stays on the index
- Running delta-import with clean=true: all records are deleted from the
index and then only the records returned by the procedure are on the
index,
except the deleted ones.

I don't see any way to achieve my goal, without changing the process
that I
do to obtain the data.
Since this is a very complex stored procedure, with tons of joins and
custom
processing, I am trying everything to avoid messing with it.

See below a copy of my data-config.xml file. I made it simpler omitting
all
the fields, since it's out of scope of the issue:
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=JdbcDataSource
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
password;responseBuffering=adaptive;

/
document
entity name=entity_one
pk=entityid
transformer=RegexTransformer
query=EXEC some_stored_procedure ${dataimporter.request.someid}
preImportDeleteQuery=status:1 postImportDeleteQuery=status:1

field column=field1 name=field1 splitBy=; /
field column=field2 name=field2 splitBy=; /
field column=field3 name=field3 splitBy=; /
/entity

entity name=entity_two
pk=entityid
transformer=RegexTransformer
query=EXEC someother_stored_procedure
${dataimporter.request.someotherid}
preImportDeleteQuery=status:1 postImportDeleteQuery=status:1

field column=field1 name=field1 /
field column=field2 name=field2 /
field column=field3 name=field2 /
/entity
/document
/dataConfig

Any ideas or pointers that might help on this one?

Many thanks,
Alexandre


Re: MaxWarming Searcher

2011-05-25 Thread Grijesh
Maxwarm searcher should be 2 for best practices.
Either Your commit frequency is high or you have autowarming the queries on
master too in big numbers.

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/MaxWarming-Searcher-tp2982658p2983622.html
Sent from the Solr - User mailing list archive at Nabble.com.


Returning documents using multi-valued field

2011-05-25 Thread Kurt Sultana
Hi all,

I'm quite new to Solr and I'm supporting an existing Solr search engine
which was written by someone else. I've been reading on Solr for the last
couple of weeks so I'd consider myself beyond the basics.

A particular field, let's say name, is multi-valued. For example, a document
has a field name with values Alice, Trudy. We want that the document is
returned when Alice or Trudy is input and not when Alice Trudy is
entered. Currently the document is even with Alice Trudy. How could this
be done?

Thanks a lot!
Kurt


Re: Termscomponent sort question

2011-05-25 Thread antonio
No one has an idea?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2983776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: adding results external to index

2011-05-25 Thread abhayd
Any help? It can be done out side of solr application but just wanted to know
if solr has some features for supporting this

--
View this message in context: 
http://lucene.472066.n3.nabble.com/adding-results-external-to-index-tp2946548p2983984.html
Sent from the Solr - User mailing list archive at Nabble.com.


problem in setting field attribute in schema.xml

2011-05-25 Thread Romi
In my schema.xml file i made a filed attribute indexed=false and stored=true.
ie. i am not indexing this field but still in my search results i am getting
values for this field, why is so any idea?

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984126.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Romi
Please reply, i am not getting any of my problems reply in this forum.

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984151.html
Sent from the Solr - User mailing list archive at Nabble.com.


Escaping equals-sign in external file field

2011-05-25 Thread Markus Jelsma
Hi,

It seems i cannot escape the equals-sign in the source file for the external 
file field. Anyone knows another work-around? Except for not using values with 
that character of course ;)

Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: correctlySpelled and onlyMorePopular in 3.1

2011-05-25 Thread Markus Jelsma
Any thoughts on this one?

On Monday 23 May 2011 17:41:00 Markus Jelsma wrote:
 Hi,
 
 I know about the behaviour of the onlyMorePopular setting. It can return
 suggestions while the actual query is correctly spelled. There is, in my
 opinion, some bad behaviour, consider the following query that is correctly
 spelled and yields results and never suggestions:
 
 q=testspellcheck.onlyMorePopular=false
 bool name=correctlySpelledtrue/bool
 
 
 q=testspellcheck.onlyMorePopular=true
 bool name=correctlySpelledfalse/bool
 
 Now, also consider the following scenario with onlyMorePopular enabled.
 Both term_a and term_b are correctly spelled and in the index.
 
 q=term_a
 bool name=correctlySpelledtrue/bool
 str name=collationterm_b/str
 
 q=term_b
 bool name=correctlySpelledfalse/bool
 
 The value of correctlySpelled can be very counter intuitive when
 onlyMorePopular is enabled, isn't it? File an issue or live with it?
 
 Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread bryan rasmussen
if you never want to see a result for a field set stored = false.

Best Regards,
Bryan Rasmussen

On Wed, May 25, 2011 at 2:37 PM, Romi romijain3...@gmail.com wrote:
 In my schema.xml file i made a filed attribute indexed=false and stored=true.
 ie. i am not indexing this field but still in my search results i am getting
 values for this field, why is so any idea?

 -
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984126.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Romi
if i do stored=false then it indexes the data but not shows the data in
search result. but in my case i do not want to index the data for a field
and to the my surprise even if i am doing indexed=false for this field, i
am still able to get that data through the query *:* but not getting the
data if i run filter query as field:value, its really confusing what solr is
doing. 

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984239.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread bryan rasmussen
surely it indexes the data if you do indexed = true.

if you put some data in the field that is unique to that document and
then search do you get it? If not then it is because it is not
indexed. If you do a search for another field in the same document but
still get the non-indexed field shown it is because the non-indexed
field is stored.

Best Regards,
Bryan Rasmussen

On Wed, May 25, 2011 at 3:11 PM, Romi romijain3...@gmail.com wrote:
 if i do stored=false then it indexes the data but not shows the data in
 search result. but in my case i do not want to index the data for a field
 and to the my surprise even if i am doing indexed=false for this field, i
 am still able to get that data through the query *:* but not getting the
 data if i run filter query as field:value, its really confusing what solr is
 doing.

 -
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984239.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Romi
If i do uniqueField to indexed=false then it shows the exception
org.apache.solr.common.SolrException: Schema Parsing Failed


and in http://wiki.apache.org/solr/SchemaXml#Fields it is clearly mentioned
that a non-indexed field is not searchable then why i am getting search
result. why should stored=true matter if indexed=false


-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2984306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-25 Thread Tanguy Moal

Dear list,

I'm posting here after some unsuccessful investigations.
In my setup I push documents to Solr using the StreamingUpdateSolrServer.

I'm sending a comfortable initial amount of documents (~250M) and wished 
to perform overwriting of duplicated documents at index time, during the 
update, taking advantage of the UpdateProcessorChain.


At the beginning of the indexing stage, everything is quite fast; 
documents arrive at a rate of about 1000 doc/s.
The only extra processing during the import is computation of a couple 
of hashes that are used to identify uniquely documents given their 
content, using both stock (MD5Signature) and custom (derived from 
Lookup3Signature) update processors.

I send a commit command to the server every 500k documents sent.

During a first period, the server is CPU bound. After a short while (~10 
minutes), the rate at which documents are received starts to fall 
dramatically, the server being IO bound.
I've been firstly thinking of a normal speed decrease during the commit, 
while my push client is waiting for the flush to occur. That would have 
been a normal slowdown.


The thing that retained my attention was the fact that unexpectedly, the 
server was performing a lot of small reads, way more the number writes, 
which seem to be larger.
The combination of the many small reads with the constant amount of 
bigger writes seem to be creating a lot of IO contention on my commodity 
SATA drive, and the ETA of my built index started to increase scarily =D


I then restarted the JVM with JMX enabled so I could start investigating 
a little bit more. I've the realized that the UpdateHandler was 
performing many reads while processing the update request.


Are there any known limitations around the UpdateProcessorChain, when 
overwriteDupes is set to true ?
I turned that off, which of course breaks the intent of my built index, 
but for comparison purposes it's good.


That did the trick, indexing is fast again, even with the periodic commits.

I therefor have two questions, an interesting first  one and a boring 
second one :


1 / What's the workflow of the UpdateProcessorChain when one or more 
processors have overwriting of duplicates turned on ? What happens under 
the hood ?


I tried to answer that myself looking at DirectUpdateHandler2 and my 
understanding stopped at the following :

- The document is added to the lucene IW
- The duplicates are deleted from the lucene IW
The dark magic I couldn't understand seems to occur around the idTerm 
and updateTerm things, in the addDoc method. The deletions seem to be 
buffered somewhere, I just didn't get it :-)


I might be wrong since I didn't read the code more than that, but the 
point might be at how does solr handles deletions, which is something 
still unclear to me. In anyways, a lot of reads seem to occur for that 
precise task and it tends to produce a lot of IO, killing indexing 
performances when overwriteDupes is on. I don't even understand why so 
many read operations occur at this stage since my process had a 
comfortable amount of RAM (with Xms=Xmx=8GB), with only 4.5GB are used 
so far.


Any help, recommandation or idea is welcome :-)

2 / In the case there isn't a simple fix for this, I'll have to do with 
duplicates in my index. I don't mind since solr offers a great grouping 
feature, which I already use in some other applications. The only thing 
I don't know yet is that if I do rely on grouping at search time, in 
combination with the Stats component (which is the intent of that 
index), and limiting the results to 1 document per group, will the 
computed statistics take those duplicates into account or not ? Shortly, 
how well does the Stats component behave when combined to hits collapsing ?


I had firstly implemented my solution using overwriteDupes because it 
would have reduced both the target size of my index and the complexity 
of queries used to obtain statistics on the search results, at one time.


Thank you very much in advance.

--
Tanguy



Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Yury Kats
On 5/25/2011 9:29 AM, Romi wrote:
 and in http://wiki.apache.org/solr/SchemaXml#Fields it is clearly mentioned
 that a non-indexed field is not searchable then why i am getting search
 result. why should stored=true matter if indexed=false

indexed controls whether you can find the document based on the content of 
this field.
stored controls whether you will see the content of this field in the result.



RE: problem in setting field attribute in schema.xml

2011-05-25 Thread Vignesh Raj
It's very strange. Even I tried the same now and am getting the same result.
I have set both indexed=false and stored=false.
But still if I search for a keyword using my default search, I get the
results in these fields as well.
But if I specify field:value, it shows 0 results.

Can anyone explain? 

Regards
Vignesh


-Original Message-
From: Romi [mailto:romijain3...@gmail.com] 
Sent: 25 May 2011 18:42
To: solr-user@lucene.apache.org
Subject: Re: problem in setting field attribute in schema.xml

if i do stored=false then it indexes the data but not shows the data in
search result. but in my case i do not want to index the data for a field
and to the my surprise even if i am doing indexed=false for this field, i
am still able to get that data through the query *:* but not getting the
data if i run filter query as field:value, its really confusing what solr is
doing. 

-
Romi
--
View this message in context:
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-sch
ema-xml-tp2984126p2984239.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Escaping equals-sign in external file field

2011-05-25 Thread Markus Jelsma
Created issue and added simple patch:
https://issues.apache.org/jira/browse/SOLR-2545

On Wednesday 25 May 2011 14:55:34 Markus Jelsma wrote:
 Hi,
 
 It seems i cannot escape the equals-sign in the source file for the
 external file field. Anyone knows another work-around? Except for not
 using values with that character of course ;)
 
 Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Jan Høydahl
You probably get tricked by an old index which was created while you had 
stored=true

Delete your index, restart Solr, re-index content and try again.

Solr will happily serve what's in the Lucene index even if it does not match 
your current schema - that's why it's important to re-index everything if you 
make changes to the schema and you want those changes to be visible.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 25. mai 2011, at 15.47, Vignesh Raj wrote:

 It's very strange. Even I tried the same now and am getting the same result.
 I have set both indexed=false and stored=false.
 But still if I search for a keyword using my default search, I get the
 results in these fields as well.
 But if I specify field:value, it shows 0 results.
 
 Can anyone explain? 
 
 Regards
 Vignesh
 
 
 -Original Message-
 From: Romi [mailto:romijain3...@gmail.com] 
 Sent: 25 May 2011 18:42
 To: solr-user@lucene.apache.org
 Subject: Re: problem in setting field attribute in schema.xml
 
 if i do stored=false then it indexes the data but not shows the data in
 search result. but in my case i do not want to index the data for a field
 and to the my surprise even if i am doing indexed=false for this field, i
 am still able to get that data through the query *:* but not getting the
 data if i run filter query as field:value, its really confusing what solr is
 doing. 
 
 -
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-sch
 ema-xml-tp2984126p2984239.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Michael Lackhoff

Am 25.05.2011 15:47, schrieb Vignesh Raj:

It's very strange. Even I tried the same now and am getting the same result.
I have set both indexed=false and stored=false.
But still if I search for a keyword using my default search, I get the
results in these fields as well.
But if I specify field:value, it shows 0 results.

Can anyone explain?


I guess you copy the field to your default search field.

-Michael


copyField generates multiple values encountered for non multiValued field

2011-05-25 Thread Alexander Golubowitsch

Dear list,
 
hope somebody can help me understand/avoid this.
 
I am sending an add request with allowDuplicates=false to a Solr 1.4.1
instance.
This is for debugging purposes, so I am sending the exact same data that are
already stored in Solr's index.
I am using the PHP PECL libraries, which fail completely in giving me any
hint on what goes wrong.

Only sending the same add request again gives me a proper
SolrClientException that hints:
 
ERROR: [288400] multiple values encountered for non multiValued field
field2 [fieldvalue, fieldvalue]

The scenario:
- field1 is implicitly single value, type text, indexed and stored
- field2 is generated via a copyField directive in schema.xml, implicitly
single value, type string, indexed and stored

What appears to happen:
- On the first add (SolrClient::addDocuments(array(SolrInputDocument
theDocument))), regular fields like field1 get overwritten as intended
- field2, defined with a copyField, but still single value, gets
_appended_ instead
- When I retrieve the updated document in a query and try to add it again,
it won't let me because of the inconsistent multi-value state
- The PECL library, in addition, appears to hit some internal exception
(that it doesn't handle properly) when encountering multiple values for a
single value field. That gives me zero results querying a set that includes
the document via PHP, while the document can be retrieved properly, though
in inconsistent state, any other way.

But: Solr appears to be generating the corrupted state itsself via
copyField?
What's going wrong? I'm pretty confused...

Thank you,
 Alex



Re: very slow commits and overlapping commits

2011-05-25 Thread Bill Au
I am taking a snapshot after every commit.  From looking at the snapshots,
it does not look like the delay in caused by segments merging because I am
not seeing any large new segments after a commit.

I still can't figure out why there is a 2 minutes gap between start commit
and SolrDelectionPolicy.onCommit.  Will changing the deletion policy make
any difference?  I am using the default deletion policy now.

Bill

2011/5/21 Erick Erickson erickerick...@gmail.com

 Well, committing less offside a possibilty  g. Here's what's probably
 happening. When you pass certain thresholds, segments are merged which can
 take quite some time.  His are you triggering commits? If it's external,
 think about using auto commit instead.

 Best
 Erick
 On May 20, 2011 6:04 PM, Bill Au bill.w...@gmail.com wrote:
  On my Solr 1.4.1 master I am doing commits regularly at a fixed interval.
 I
  noticed that from time to time commit will take longer than the commit
  interval, causing commits to overlap. Then things will get worse as
 commit
  will take longer and longer. Here is the logs for a long commit:
 
 
  [2011-05-18 23:47:30.071] start
 

 commit(optimize=false,waitFlush=false,waitSearcher=false,expungeDeletes=false)
  [2011-05-18 23:49:48.119] SolrDeletionPolicy.onCommit: commits:num=2
  [2011-05-18 23:49:48.119]
 

 commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpa,version=1247782702272,generation=249742,filenames=[_4dqu_2g.del,
  _4e66.tis, _4e3r.tis, _4e59.nrm, _4e68_1.del, _4e4n.prx, _4e4n.fnm,
  _4e67.fnm, _4e3r.frq, _4e3r.tii, _4e6d.fnm, _4e6c.prx, _4e68.fdx,
 _4e68.nrm,
  _4e6a.frq, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt,
 _4e0e.nrm,
  _4e4n.tis, _4e6e.fnm, _4e3r.prx, _4e66.fnm, _4e3r.nrm, _4e0e.prx,
 _4e4c.fdx,
  _4dx1.prx, _4e5v.frq, _4e3r.fdt, _4e4c.tis, _4e41_6.del, _4e6b.tis,
  _4e6b_1.del, _4e4y_3.del, _4e6b.tii, _4e3r.fdx, _4dx1.nrm, _4e4y.frq,
  _4e4c.fdt, _4e4c.tii, _4e6d.fdt, _4e5k.fnm, _4e41.fnm, _4e69.fnm,
 _4e67.fdt,
  _4e0e.tii, _4dty_h.del, _4e6b.fnm, _4e0e_h.del, _4e6d.fdx, _4e67.fdx,
  _4e0e.tis, _4e5v.nrm, _4dx1.fnm, _4e5v.tii, _4dqu.fdt, segments_5cpa,
  _4e5v.prx, _4dqu.fdx, _4e59.fnm, _4e6d.prx, _4e59_5.del, _4e4c.prx,
  _4e4c.nrm, _4e5k.prx, _4e66.fdx, _4dty.frq, _4e6c.frq, _4e5v.tis,
 _4e6e.tii,
  _4e66.fdt, _4e6b.fdx, _4e68.prx, _4e59.fdx, _4e6e.fdt, _4e41.prx,
 _4dx1.tii,
  _4dx1.fdt, _4e6b.fdt, _4e5v_4.del, _4e4n.fdt, _4e6e.fdx, _4dx1.fdx,
  _4e41.nrm, _4e4n.fdx, _4e6e.tis, _4e66.tii, _4e4c.fnm, _4e6b.prx,
 _4e67.prx,
  _4e0e.fnm, _4e4n.nrm, _4e67.nrm, _4e5k.nrm, _4e6a.prx, _4e68.fnm,
  _4e4c_4.del, _4dx1.tis, _4e6e.nrm, _4e59.tii, _4e68.tis, _4e67.frq,
  _4e3r.fnm, _4dty.nrm, _4e4y.prx, _4e6e.prx, _4dty.tis, _4e4y.tis,
 _4e6b.nrm,
  _4e6a.fdt, _4e4n.frq, _4e6d.frq, _4e59.fdt, _4e6a.fdx, _4e6a.fnm,
 _4dqu.tii,
  _4e41.tii, _4e67_1.del, _4e41.tis, _4dty.fdt, _4e69.tis, _4dqu.frq,
  _4dty.fdx, _4dx1.frq, _4e6e.frq, _4e66_1.del, _4e69.prx, _4e6d.tii,
  _4e5k.tii, _4e0e.fdt, _4dqu.tis, _4e6d.tis, _4e69.nrm, _4dqu.prx,
 _4e4y.fnm,
  _4e67.tis, _4e69_1.del, _4e6d.nrm, _4e6c.tis, _4e0e.fdx, _4e6c.tii,
  _4dx1_n.del, _4e5v.fnm, _4e5k.tis, _4e59.tis, _4e67.tii, _4dqu.nrm,
  _4e5k_8.del, _4e6c.fdx, _4e6c.fdt, _4e41.frq, _4e4y.fdx, _4e69.frq,
  _4e6a.tis, _4dty.prx, _4e66.frq, _4e5k.frq, _4e6a.tii, _4e69.tii,
 _4e6c.nrm,
  _4dty.fnm, _4e59.prx, _4e59.frq, _4e66.prx, _4e68.frq, _4e5k.fdx,
 _4e4y.tii,
  _4e6c.fnm, _4e0e.frq, _4e6b.frq, _4e41.fdt, _4e4n_2.del, _4dty.tii,
  _4e4y.fdt, _4e66.nrm, _4e4c.frq, _4e6a.nrm, _4e5k.fdt, _4e3r_i.del,
  _4e5v.fdt, _4e4y.nrm, _4e68.tii, _4e5v.fdx, _4e41.fdx]
  [2011-05-18 23:49:48.119]
 

 commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpb,version=1247782702273,generation=249743,filenames=[_4dqu_2g.del,
  _4e66.tis, _4e59.nrm, _4e3r.tis, _4e4n.fnm, _4e67.fnm, _4e3r.tii,
 _4e6d.fnm,
  _4e68.fdx, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt,
 _4e4n.tis,
  _4e6e.fnm, _4e0e.prx, _4e4c.tis, _4e5v.frq, _4e4y_3.del, _4e6b_1.del,
  _4e4c.tii, _4e6f.fnm, _4e5k.fnm, _4e6c_1.del, _4e41.fnm, _4dx1.fnm,
  _4e5v.nrm, _4e5v.tii, _4e5v.prx, _4e5k.prx, _4e4c.nrm, _4dty.frq,
 _4e66.fdx,
  _4e5v.tis, _4e66.fdt, _4e6e.tii, _4e59.fdx, _4e6b.fdx, _4e41.prx,
 _4e6b.fdt,
  _4e41.nrm, _4e6e.tis, _4e4c.fnm, _4e66.tii, _4e6b.prx, _4e0e.fnm,
 _4e5k.nrm,
  _4e6a.prx, _4e6e.nrm, _4e59.tii, _4e67.frq, _4dty.nrm, _4e4y.tis,
 _4e6a.fdt,
  _4e6b.nrm, _4e59.fdt, _4e6a.fdx, _4e41.tii, _4e41.tis, _4e67_1.del,
  _4dty.fdt, _4dty.fdx, _4e69.tis, _4e66_1.del, _4e6e.frq, _4e5k.tii,
  _4dqu.prx, _4e67.tis, _4e69_1.del, _4e6c.tis, _4e6c.tii, _4e5v.fnm,
  _4e5k.tis, _4e59.tis, _4e67.tii, _4e6c.fdx, _4e4y.fdx, _4e41.frq,
 _4e6c.fdt,
  _4dty.prx, _4e66.frq, _4e69.tii, _4e6c.nrm, _4e59.frq, _4e66.prx,
 _4e5k.fdx,
  _4e68.frq, _4e4y.tii, _4e4n_2.del, _4e41.fdt, _4e6b.frq, _4e4y.fdt,
  _4e66.nrm, _4e4c.frq, _4e3r_i.del, _4e5k.fdt, _4e4y.nrm, _4e41.fdx,
  _4e4n.prx, _4e68_1.del, _4e3r.frq, _4e6f.fdt, _4e6f.fdx, _4e6c.prx,
  _4e68.nrm, _4e6a.frq, 

RE: problem in setting field attribute in schema.xml

2011-05-25 Thread Vignesh Raj
I tried deleting the index and trying it again. But still I get the same
result.

Regards
Vignesh

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com] 
Sent: 25 May 2011 19:30
To: solr-user@lucene.apache.org
Subject: Re: problem in setting field attribute in schema.xml

You probably get tricked by an old index which was created while you had
stored=true

Delete your index, restart Solr, re-index content and try again.

Solr will happily serve what's in the Lucene index even if it does not match
your current schema - that's why it's important to re-index everything if
you make changes to the schema and you want those changes to be visible.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 25. mai 2011, at 15.47, Vignesh Raj wrote:

 It's very strange. Even I tried the same now and am getting the same
result.
 I have set both indexed=false and stored=false.
 But still if I search for a keyword using my default search, I get the
 results in these fields as well.
 But if I specify field:value, it shows 0 results.
 
 Can anyone explain? 
 
 Regards
 Vignesh
 
 
 -Original Message-
 From: Romi [mailto:romijain3...@gmail.com] 
 Sent: 25 May 2011 18:42
 To: solr-user@lucene.apache.org
 Subject: Re: problem in setting field attribute in schema.xml
 
 if i do stored=false then it indexes the data but not shows the data in
 search result. but in my case i do not want to index the data for a field
 and to the my surprise even if i am doing indexed=false for this field,
i
 am still able to get that data through the query *:* but not getting the
 data if i run filter query as field:value, its really confusing what solr
is
 doing. 
 
 -
 Romi
 --
 View this message in context:

http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-sch
 ema-xml-tp2984126p2984239.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Similarity per field

2011-05-25 Thread Brian Lamb
Hi all,

I sent a mail in about this topic a week ago but now that I have more
information about what I am doing, as well as a better understanding of how
the similarity class works, I wanted to start a new thread with a bit more
information about what I'm doing, what I want to do, and how I can make it
work correctly.

I have written a similarity class that I would like applied to a specific
field.

This is how I am defining the fieldType:

fieldType name=edgengram_cust class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=1 side=front /
   /analyzer
   similarity class=my.package.similarity.MySimilarity/
/fieldType

And then I assign a specific field to that fieldType:

field name=myfield multiValued=true type=edgengram_cust
indexed=true stored=true required=false omitNorms=true /

Then, I restarted solr and did a fullimport. However, the changes I have
made do not appear to be taking hold. For simplicity, right now I just have
the idf function returning 1. When I do a search with debugQuery=on, the idf
behaves as it normally does. However, when I search on this field, the idf
should be 1 and that is not the case.

To try and nail down where the problem occurs, I commented out the
similarity class definition in the fieldType and added it globally to the
schema file:

similarity class=my.package.similarity.MySimilarity/

Then, I restarted solr and did a fullimport. This time, the idf scores were
all 1. So it seems to me the problem is not with my similarity class but in
trying to apply it to a specific fieldType.

According to https://issues.apache.org/jira/browse/SOLR-2338, this should be
in the trunk now yes? I have run svn up on both my lucene and solr installs
and it still is not recognizing it on a per field basis.

Is the tag different inside a fieldType? Did I not update solr correctly?
Where is my mistake?

Thanks,

Brian Lamb


communication protocol between master and slave

2011-05-25 Thread antoniosi
Hi,

I am just curious what is the communication protocol that a slave node get
the index update from the master node in a replication settings? Is it
through TCP? I assume it only gets the delta?

Thanks very much in advance.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/communication-protocol-between-master-and-slave-tp2985163p2985163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH import and postImportDeleteQuery

2011-05-25 Thread Alexandre Rocco
Hi Ephraim,

Thank you so much for the input.
I was able to find your thread on the archives and got your solution to
work.

In fact, when using $deleteDocById and $skipDoc it worked like a charm. This
feature is very useful, it's a shame it's not properly documented.
The only downside is the one you mentioned that the stats are not updated,
so if I update 13 documents and delete 2, DIH would tell me that only 13
documents were processed. This is bad in my case because I check the end
result to generate an error e-mail if needed.

You also mentioned that if the query contains only deletion records, a
commit would not be automatically executed and it would be necessary to
commit manually.

How can I commit manually via DIH? I was not able to find any references on
the documentation.

Thanks!
Alexandre

On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote:

 Search the list for my post DIH - deleting documents, high performance
 (delta) imports, and passing parameters which shows my solution a
 similar problem.

 Ephraim Ofir

 -Original Message-
 From: Alexandre Rocco [mailto:alel...@gmail.com]
 Sent: Tuesday, May 24, 2011 11:24 PM
 To: solr-user@lucene.apache.org
 Subject: DIH import and postImportDeleteQuery

 Guys,

 I am facing a situation in one of our projects that I need to perform a
 cleanup to remove some documents after we perform an update via DIH.
 The big issue right now comes from the fact that when we call the DIH
 with
 clean=false, the postImportDeleteQuery is not executed.

 My setup is currently arranged like this:
 - A SQL Server stored procedure that receives a parameter (specified in
 the
 URL) and returns the records to be indexed
 - The procedure is able to return all the records (for a full-import) or
 only the updated records (for a delta-import)
 - This procedure returns valid and deleted records, from this point
 comes
 the need to run a postImportDeleteQuery to remove the deleted ones.

 Everything works fine when I run a full-import, I am running always with
 clean=true, and then the whole index is rebuilt.
 When I need to do an incremental update, the records are updated
 correctly,
 but the command to delete the other records is not executed.

 I've tried several combinations, with different results:
 - Running full-import with clean=false: the records are updated but the
 ones
 that needs to be deleted stays on the index
 - Running delta-import with clean=false: the records are updated but the
 ones that needs to be deleted stays on the index
 - Running delta-import with clean=true: all records are deleted from the
 index and then only the records returned by the procedure are on the
 index,
 except the deleted ones.

 I don't see any way to achieve my goal, without changing the process
 that I
 do to obtain the data.
 Since this is a very complex stored procedure, with tons of joins and
 custom
 processing, I am trying everything to avoid messing with it.

 See below a copy of my data-config.xml file. I made it simpler omitting
 all
 the fields, since it's out of scope of the issue:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
 dataSource type=JdbcDataSource
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
 password;responseBuffering=adaptive;

 /
 document
 entity name=entity_one
 pk=entityid
 transformer=RegexTransformer
 query=EXEC some_stored_procedure ${dataimporter.request.someid}
 preImportDeleteQuery=status:1 postImportDeleteQuery=status:1
 
 field column=field1 name=field1 splitBy=; /
 field column=field2 name=field2 splitBy=; /
 field column=field3 name=field3 splitBy=; /
 /entity

 entity name=entity_two
 pk=entityid
 transformer=RegexTransformer
 query=EXEC someother_stored_procedure
 ${dataimporter.request.someotherid}
 preImportDeleteQuery=status:1 postImportDeleteQuery=status:1
 
 field column=field1 name=field1 /
 field column=field2 name=field2 /
 field column=field3 name=field2 /
 /entity
 /document
 /dataConfig

 Any ideas or pointers that might help on this one?

 Many thanks,
 Alexandre



RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Dyer, James
The failure to commit bug with $deleteDocById can be fixed by applying patch 
SOLR-2492.  This patch also partially fixes the no updated stats bug in that 
it increments 1 for every call to $deleteDocById and $deleteDocByQuery.  Note 
that this might result in inaccurate counts if the id given with $deleteDocById 
doesn't exist or is duplicated.  Obviously this is not a complete fix for stats 
using $deleteDocByQuery as this command would normally be used to delete 1 doc 
at a time.

The patch is for Trunk but it might work with 3.1 also.  If not, it likely only 
needs minor tweaking.  

The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Wednesday, May 25, 2011 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH import and postImportDeleteQuery

Hi Ephraim,

Thank you so much for the input.
I was able to find your thread on the archives and got your solution to
work.

In fact, when using $deleteDocById and $skipDoc it worked like a charm. This
feature is very useful, it's a shame it's not properly documented.
The only downside is the one you mentioned that the stats are not updated,
so if I update 13 documents and delete 2, DIH would tell me that only 13
documents were processed. This is bad in my case because I check the end
result to generate an error e-mail if needed.

You also mentioned that if the query contains only deletion records, a
commit would not be automatically executed and it would be necessary to
commit manually.

How can I commit manually via DIH? I was not able to find any references on
the documentation.

Thanks!
Alexandre

On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote:

 Search the list for my post DIH - deleting documents, high performance
 (delta) imports, and passing parameters which shows my solution a
 similar problem.

 Ephraim Ofir

 -Original Message-
 From: Alexandre Rocco [mailto:alel...@gmail.com]
 Sent: Tuesday, May 24, 2011 11:24 PM
 To: solr-user@lucene.apache.org
 Subject: DIH import and postImportDeleteQuery

 Guys,

 I am facing a situation in one of our projects that I need to perform a
 cleanup to remove some documents after we perform an update via DIH.
 The big issue right now comes from the fact that when we call the DIH
 with
 clean=false, the postImportDeleteQuery is not executed.

 My setup is currently arranged like this:
 - A SQL Server stored procedure that receives a parameter (specified in
 the
 URL) and returns the records to be indexed
 - The procedure is able to return all the records (for a full-import) or
 only the updated records (for a delta-import)
 - This procedure returns valid and deleted records, from this point
 comes
 the need to run a postImportDeleteQuery to remove the deleted ones.

 Everything works fine when I run a full-import, I am running always with
 clean=true, and then the whole index is rebuilt.
 When I need to do an incremental update, the records are updated
 correctly,
 but the command to delete the other records is not executed.

 I've tried several combinations, with different results:
 - Running full-import with clean=false: the records are updated but the
 ones
 that needs to be deleted stays on the index
 - Running delta-import with clean=false: the records are updated but the
 ones that needs to be deleted stays on the index
 - Running delta-import with clean=true: all records are deleted from the
 index and then only the records returned by the procedure are on the
 index,
 except the deleted ones.

 I don't see any way to achieve my goal, without changing the process
 that I
 do to obtain the data.
 Since this is a very complex stored procedure, with tons of joins and
 custom
 processing, I am trying everything to avoid messing with it.

 See below a copy of my data-config.xml file. I made it simpler omitting
 all
 the fields, since it's out of scope of the issue:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
 dataSource type=JdbcDataSource
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
 password;responseBuffering=adaptive;

 /
 document
 entity name=entity_one
 pk=entityid
 transformer=RegexTransformer
 query=EXEC some_stored_procedure ${dataimporter.request.someid}
 preImportDeleteQuery=status:1 postImportDeleteQuery=status:1
 
 field column=field1 name=field1 splitBy=; /
 field column=field2 name=field2 splitBy=; /
 field column=field3 name=field3 splitBy=; /
 /entity

 entity name=entity_two
 pk=entityid
 transformer=RegexTransformer
 query=EXEC someother_stored_procedure
 ${dataimporter.request.someotherid}
 preImportDeleteQuery=status:1 postImportDeleteQuery=status:1
 
 field column=field1 name=field1 /
 field column=field2 name=field2 /
 field column=field3 name=field2 /
 /entity
 /document
 /dataConfig

 Any ideas or pointers 

Re: communication protocol between master and slave

2011-05-25 Thread Jonathan Rochkind
I'm pretty sure it's over HTTP, although I don't know the details of the 
requests/responses.


The slave will download any index files that have changed on master. A 
Solr index is split up amongst a number of seperate files on disk.  
There's no way for slave to get a delta beyond getting a complete index 
file if and only if it's changed --- index files that haven't changed 
won't be downloaded, index files that are new will be downloaded. (I 
think 'new' is basically the same as 'changed', I am not sure if index 
files are ever actually changed, rather than new ones being created and 
old ones (after a merge/optimize operation) being deleted).


One side effect of this is if an 'optimize' is run on master, then 
typically all index files will have to be downloaded. (Likewise if an 
optimize is run on slave, next replication all index files will be 
downloaded. There's generally no good reason to run an optimize on 
slave, or otherwise do anything to change the index at all on slave).


On 5/25/2011 1:11 PM, antoniosi wrote:

Hi,

I am just curious what is the communication protocol that a slave node get
the index update from the master node in a replication settings? Is it
through TCP? I assume it only gets the delta?

Thanks very much in advance.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/communication-protocol-between-master-and-slave-tp2985163p2985163.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Similarity per field

2011-05-25 Thread Brian Lamb
I looked at the patch page and saw the files that were changed. I went into
my install and looked at those same files and found that they had indeed
been changed. So it looks like I have the correct version of solr.

On Wed, May 25, 2011 at 1:01 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I sent a mail in about this topic a week ago but now that I have more
 information about what I am doing, as well as a better understanding of how
 the similarity class works, I wanted to start a new thread with a bit more
 information about what I'm doing, what I want to do, and how I can make it
 work correctly.

 I have written a similarity class that I would like applied to a specific
 field.

 This is how I am defining the fieldType:

 fieldType name=edgengram_cust class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=1 side=front /
/analyzer
similarity class=my.package.similarity.MySimilarity/
 /fieldType

 And then I assign a specific field to that fieldType:

 field name=myfield multiValued=true type=edgengram_cust
 indexed=true stored=true required=false omitNorms=true /

 Then, I restarted solr and did a fullimport. However, the changes I have
 made do not appear to be taking hold. For simplicity, right now I just have
 the idf function returning 1. When I do a search with debugQuery=on, the idf
 behaves as it normally does. However, when I search on this field, the idf
 should be 1 and that is not the case.

 To try and nail down where the problem occurs, I commented out the
 similarity class definition in the fieldType and added it globally to the
 schema file:

 similarity class=my.package.similarity.MySimilarity/

 Then, I restarted solr and did a fullimport. This time, the idf scores were
 all 1. So it seems to me the problem is not with my similarity class but in
 trying to apply it to a specific fieldType.

 According to https://issues.apache.org/jira/browse/SOLR-2338, this should
 be in the trunk now yes? I have run svn up on both my lucene and solr
 installs and it still is not recognizing it on a per field basis.

 Is the tag different inside a fieldType? Did I not update solr correctly?
 Where is my mistake?

 Thanks,

 Brian Lamb



Re: DIH import and postImportDeleteQuery

2011-05-25 Thread Alexandre Rocco
Hi James,

Thanks for the heads up!
I am currently on version 1.4.1, so I can apply this patch and see if it
works.
Just need to assess if it's best to apply the patch or to check on the
backend system to see if only delete requests were generated and then do not
call DIH.

Previously, I found another open issue, created from Ephraim:
https://issues.apache.org/jira/browse/SOLR-2104

It's the same issue, but it hasn't had any updates yet.

Regards,
Alexandre

On Wed, May 25, 2011 at 3:17 PM, Dyer, James james.d...@ingrambook.comwrote:

 The failure to commit bug with $deleteDocById can be fixed by applying
 patch SOLR-2492.  This patch also partially fixes the no updated stats bug
 in that it increments 1 for every call to $deleteDocById and
 $deleteDocByQuery.  Note that this might result in inaccurate counts if the
 id given with $deleteDocById doesn't exist or is duplicated.  Obviously this
 is not a complete fix for stats using $deleteDocByQuery as this command
 would normally be used to delete 1 doc at a time.

 The patch is for Trunk but it might work with 3.1 also.  If not, it likely
 only needs minor tweaking.

 The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Alexandre Rocco [mailto:alel...@gmail.com]
 Sent: Wednesday, May 25, 2011 12:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DIH import and postImportDeleteQuery

 Hi Ephraim,

 Thank you so much for the input.
 I was able to find your thread on the archives and got your solution to
 work.

 In fact, when using $deleteDocById and $skipDoc it worked like a charm.
 This
 feature is very useful, it's a shame it's not properly documented.
 The only downside is the one you mentioned that the stats are not updated,
 so if I update 13 documents and delete 2, DIH would tell me that only 13
 documents were processed. This is bad in my case because I check the end
 result to generate an error e-mail if needed.

 You also mentioned that if the query contains only deletion records, a
 commit would not be automatically executed and it would be necessary to
 commit manually.

 How can I commit manually via DIH? I was not able to find any references on
 the documentation.

 Thanks!
 Alexandre

 On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote:

  Search the list for my post DIH - deleting documents, high performance
  (delta) imports, and passing parameters which shows my solution a
  similar problem.
 
  Ephraim Ofir
 
  -Original Message-
  From: Alexandre Rocco [mailto:alel...@gmail.com]
  Sent: Tuesday, May 24, 2011 11:24 PM
  To: solr-user@lucene.apache.org
  Subject: DIH import and postImportDeleteQuery
 
  Guys,
 
  I am facing a situation in one of our projects that I need to perform a
  cleanup to remove some documents after we perform an update via DIH.
  The big issue right now comes from the fact that when we call the DIH
  with
  clean=false, the postImportDeleteQuery is not executed.
 
  My setup is currently arranged like this:
  - A SQL Server stored procedure that receives a parameter (specified in
  the
  URL) and returns the records to be indexed
  - The procedure is able to return all the records (for a full-import) or
  only the updated records (for a delta-import)
  - This procedure returns valid and deleted records, from this point
  comes
  the need to run a postImportDeleteQuery to remove the deleted ones.
 
  Everything works fine when I run a full-import, I am running always with
  clean=true, and then the whole index is rebuilt.
  When I need to do an incremental update, the records are updated
  correctly,
  but the command to delete the other records is not executed.
 
  I've tried several combinations, with different results:
  - Running full-import with clean=false: the records are updated but the
  ones
  that needs to be deleted stays on the index
  - Running delta-import with clean=false: the records are updated but the
  ones that needs to be deleted stays on the index
  - Running delta-import with clean=true: all records are deleted from the
  index and then only the records returned by the procedure are on the
  index,
  except the deleted ones.
 
  I don't see any way to achieve my goal, without changing the process
  that I
  do to obtain the data.
  Since this is a very complex stored procedure, with tons of joins and
  custom
  processing, I am trying everything to avoid messing with it.
 
  See below a copy of my data-config.xml file. I made it simpler omitting
  all
  the fields, since it's out of scope of the issue:
  ?xml version=1.0 encoding=UTF-8 ?
  dataConfig
  dataSource type=JdbcDataSource
  driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
  url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
  password;responseBuffering=adaptive;
 
  /
  document
  entity name=entity_one
  pk=entityid
  transformer=RegexTransformer
  

RE: DIH import and postImportDeleteQuery

2011-05-25 Thread Dyer, James
Great.  I wasn't aware of the other issue.  I put a link on the 2 issues in 
JIRA so people can know in the future.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Wednesday, May 25, 2011 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH import and postImportDeleteQuery

Hi James,

Thanks for the heads up!
I am currently on version 1.4.1, so I can apply this patch and see if it
works.
Just need to assess if it's best to apply the patch or to check on the
backend system to see if only delete requests were generated and then do not
call DIH.

Previously, I found another open issue, created from Ephraim:
https://issues.apache.org/jira/browse/SOLR-2104

It's the same issue, but it hasn't had any updates yet.

Regards,
Alexandre

On Wed, May 25, 2011 at 3:17 PM, Dyer, James james.d...@ingrambook.comwrote:

 The failure to commit bug with $deleteDocById can be fixed by applying
 patch SOLR-2492.  This patch also partially fixes the no updated stats bug
 in that it increments 1 for every call to $deleteDocById and
 $deleteDocByQuery.  Note that this might result in inaccurate counts if the
 id given with $deleteDocById doesn't exist or is duplicated.  Obviously this
 is not a complete fix for stats using $deleteDocByQuery as this command
 would normally be used to delete 1 doc at a time.

 The patch is for Trunk but it might work with 3.1 also.  If not, it likely
 only needs minor tweaking.

 The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Alexandre Rocco [mailto:alel...@gmail.com]
 Sent: Wednesday, May 25, 2011 12:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DIH import and postImportDeleteQuery

 Hi Ephraim,

 Thank you so much for the input.
 I was able to find your thread on the archives and got your solution to
 work.

 In fact, when using $deleteDocById and $skipDoc it worked like a charm.
 This
 feature is very useful, it's a shame it's not properly documented.
 The only downside is the one you mentioned that the stats are not updated,
 so if I update 13 documents and delete 2, DIH would tell me that only 13
 documents were processed. This is bad in my case because I check the end
 result to generate an error e-mail if needed.

 You also mentioned that if the query contains only deletion records, a
 commit would not be automatically executed and it would be necessary to
 commit manually.

 How can I commit manually via DIH? I was not able to find any references on
 the documentation.

 Thanks!
 Alexandre

 On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote:

  Search the list for my post DIH - deleting documents, high performance
  (delta) imports, and passing parameters which shows my solution a
  similar problem.
 
  Ephraim Ofir
 
  -Original Message-
  From: Alexandre Rocco [mailto:alel...@gmail.com]
  Sent: Tuesday, May 24, 2011 11:24 PM
  To: solr-user@lucene.apache.org
  Subject: DIH import and postImportDeleteQuery
 
  Guys,
 
  I am facing a situation in one of our projects that I need to perform a
  cleanup to remove some documents after we perform an update via DIH.
  The big issue right now comes from the fact that when we call the DIH
  with
  clean=false, the postImportDeleteQuery is not executed.
 
  My setup is currently arranged like this:
  - A SQL Server stored procedure that receives a parameter (specified in
  the
  URL) and returns the records to be indexed
  - The procedure is able to return all the records (for a full-import) or
  only the updated records (for a delta-import)
  - This procedure returns valid and deleted records, from this point
  comes
  the need to run a postImportDeleteQuery to remove the deleted ones.
 
  Everything works fine when I run a full-import, I am running always with
  clean=true, and then the whole index is rebuilt.
  When I need to do an incremental update, the records are updated
  correctly,
  but the command to delete the other records is not executed.
 
  I've tried several combinations, with different results:
  - Running full-import with clean=false: the records are updated but the
  ones
  that needs to be deleted stays on the index
  - Running delta-import with clean=false: the records are updated but the
  ones that needs to be deleted stays on the index
  - Running delta-import with clean=true: all records are deleted from the
  index and then only the records returned by the procedure are on the
  index,
  except the deleted ones.
 
  I don't see any way to achieve my goal, without changing the process
  that I
  do to obtain the data.
  Since this is a very complex stored procedure, with tons of joins and
  custom
  processing, I am trying everything to avoid messing with it.
 
  See below a copy of my data-config.xml file. I made it simpler omitting
  all
  the 

Edgengram

2011-05-25 Thread Brian Lamb
Hi all,

I'm running into some confusion with the way edgengram works. I have the
field set up as:

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
   filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=100 side=front /
   /analyzer
/fieldType

I've also set up my own similarity class that returns 1 as the idf score.
What I've found this does is if I match a string abcdefg against a field
containing abcdefghijklmnop, then the idf will score that as a 7:

7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2)

I get why that's happening, but is there a way to avoid that? Do I need to
do a new field type to achieve the desired affect?

Thanks,

Brian Lamb


Re: Termscomponent sort question

2011-05-25 Thread antonio
Help me please...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2986185.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: communication protocol between master and slave

2011-05-25 Thread antoniosi
Thanks for the prompt reply.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/communication-protocol-between-master-and-slave-tp2985163p2986413.html
Sent from the Solr - User mailing list archive at Nabble.com.


indexing numbers

2011-05-25 Thread antoniosi
Hi,

How does solr index a numeric value? Does it index it as a string or does it
keep it as a numeric value?

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-numbers-tp2986424p2986424.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing numbers

2011-05-25 Thread Rob Casson
the default schema.xml provided in the Solr distribution is
well-documented, and a good place to get started (including numeric
fieldTypes):

 http://wiki.apache.org/solr/SchemaXml

Lucid Imagination also provides a nice reference guide:

 
http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide

hope that helps,
rob

On Wed, May 25, 2011 at 6:20 PM, antoniosi antonio...@gmail.com wrote:
 Hi,

 How does solr index a numeric value? Does it index it as a string or does it
 keep it as a numeric value?

 Thanks.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/indexing-numbers-tp2986424p2986424.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Minimum Should Match not enforced with External Field + Function Query with boost

2011-05-25 Thread fbytes
Hello

Minimum Should Match does not seem to be working when I am using the boost
with external field scoring (I followed
http://dev.tailsweep.com/solr-external-scoring/ example to implement
external field scoring.)

I am using a month old solr trunk build (4.0).

Thanks for help.
Ajay


Here are input parameters to dismax request handler:
d=6sfield=latlongroup.main=truewt=jsonrows=10debugQuery=truefl=*,scorestart=0q={!boost+b=dishRating+v=$qq}pt=42.35864,-71.05666group.field=resnamegroup=trueqq=hot+chicken+wingsfq={!bbox}

MM field is defined in the default list (solrconfig.xml) as
str name=mm3/str

Debug information:
[debug] = array(11) {
[rawquerystring] = string(27) {!boost b=dishRating v=$qq}
[querystring] = string(27) {!boost b=dishRating v=$qq}
[parsedquery] = string(249) BoostedQuery(boost(text:hot
(text:chicken text:chickn text:poultri text:murgh text:pollo) (text:wing
text:wingett),FileFloatSource(field=dishRating,keyField=id,defVal=0.0,dataDir=/solr/dish/data/)))
[parsedquery_toString] = string(235) boost(text:hot (text:chicken
text:chickn text:poultri text:murgh text:pollo) (text:wing
text:wingett),FileFloatSource(field=dishRating,keyField=id,defVal=0.0,dataDir=/solr/dish/data/))
[explain] = array(10) {
  [US-MA-2256-862-240311] = string(1397) 
0.62424326 = (MATCH) boost(text:hot (text:chicken text:chickn text:poultri
text:murgh text:pollo) (text:wing
text:wingett),FileFloatSource(field=dishRating,keyField=id,defVal=0.0,dataDir=/solr/dish/data/)),
product of:
  0.15606081 = (MATCH) product of:
0.23409122 = (MATCH) sum of:
  0.13103496 = (MATCH) weight(text:hot in 221464), product of:
0.18647969 = queryWeight(text:hot), product of:
  4.497132 = idf(docFreq=16595, maxDocs=548010)
  0.04146636 = queryNorm
0.70267683 = (MATCH) fieldWeight(text:hot in 221464), product of:
  1.0 = tf(termFreq(text:hot)=1)
  4.497132 = idf(docFreq=16595, maxDocs=548010)
  0.15625 = fieldNorm(field=text, doc=221464)
  0.103056274 = (MATCH) sum of:
0.103056274 = (MATCH) weight(text:chicken in 221464), product of:
  0.11693921 = queryWeight(text:chicken), product of:
2.8200984 = idf(docFreq=88782, maxDocs=548010)
0.04146636 = queryNorm
  0.8812808 = (MATCH) fieldWeight(text:chicken in 221464), product
of:
2.0 = tf(termFreq(text:chicken)=4)
2.8200984 = idf(docFreq=88782, maxDocs=548010)
0.15625 = fieldNorm(field=text, doc=221464)
0.667 = coord(2/3)
  4.0 =
float(dishRating{type=dishRatingFile,properties=omitTermFreqAndPositions})=4.0

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Minimum-Should-Match-not-enforced-with-External-Field-Function-Query-with-boost-tp2985564p2985564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Special character in a field used by sort parameter

2011-05-25 Thread Joey
Marc SCHNEIDER marc.schneider73 at gmail.com writes:

 
 Hi,
 
 I have a field called test-id but I can't use it when sorting, for example :
 Doesn't work : (undefined field test)
 http://localhost:8180/solr/test-public/select/?q=test-id:1sort=test-id+asc
 http://localhost:8180/solr/test-public/select/?q=test-id:1sort=test\-id+asc
 
 When removing the sort parameter then it works...
 
 Is there a way of escaping the field name in sort parameter?
 
 Thanks in advance,
 Marc.
 


I've also got a similar issue. When the field name has a hyphen and the first
character is alphabetical, upon sorting solr says my field is undefined. 

a) It sorts fine when the first character is numerical, and
b) I've tried encoding the url but hyphens don't encode.

If anyone has a fix, I would be stoked to hear it.

J



Tools?

2011-05-25 Thread Sujatha Arun
Hello,

Are there any tools that can be used for analyzing the solr logs?

Regards
Sujatha


Re: Termscomponent sort question

2011-05-25 Thread Dmitry Kan
Hi antonio,

Can you sort yourself on client side?

Are you trying to sort the terms with the same count in reverse order of
their lengths?

On Tue, May 24, 2011 at 8:18 PM, antonio antonio...@email.it wrote:

 Hi, i use solr 3.1.
 I implemented my autocomplete with TermsComponent. I'm finding, if there
 is,
 a way to sort my finding terms by score.
 Example, i there are two terms: Rome and Near Rome, that have the same
 count (that is 1), i would that Rome will be before Near Rome.
 Because count is the same, if i use index as sort, Near Rome is
 lexically before Rome.

 Is there a way to use score like in dismax for termscomponents? Using
 dismax, for example, if i search Rome, the word Rome has max score than
 Near Rome. I would the same behavior with TermComponent.

 Is it possible?

 Thanks.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Termscomponent-sort-question-tp2980683p2980683.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan


analyzer type - does it default to index or query?

2011-05-25 Thread Andy
Hi,

When specifying an analyzer for a fieldType, I can say type=index or 
type=query

What if I don't spcify the type for an analyzer? Does it default to index or 
query or both?

Thanks.


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Romi
indexed controls whether you can find the document based on the content of
this field.
stored controls whether you will see the content of this field in the
result.


ya...but when i set indexed=false for a particular field, and i search as
*:* then it will search all documents thats true, but what i think is it
should not contain the field which i set as indexed=true.
for example in a document fields are id, author,title. and i for author
field i set indexed=false, then author should not be indexed and when i
perform search  as *:* it should show all documents as
doc
string name= id id1/string
string name=titlet1/string
string name=authora1/string
/doc 

but if i search author:a1, then 0 result will be found, why so?? to be very
clear i am performing full-import where every time new indexes are created
then also for safer side i deleted the indexes and recreated them then too i
am facing the same problem.

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2987530.html
Sent from the Solr - User mailing list archive at Nabble.com.


FieldCache

2011-05-25 Thread Jean-Sebastien Vachon
Hi All,

 

Since there is no way of controlling the size of Lucene's internal
FieldCache, how can we make sure that we are making good use of it? One of
my shard has close to 1.5M documents and the fieldCache only contains about
10 elements.

 

Is there anything we can do to control this?

 

Thanks



What is omitNorms

2011-05-25 Thread Romi
hi, I want to know what is omitNorms for a field in schema.xml and what will
be its effect on indexing and searching if I set it to true or false, please
suggest me some suitable example.

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2987547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is omitNorms

2011-05-25 Thread Romi
and i also wanted to know  what is difference if i set omitNorms in fieldType
or if i set it in field.

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2987562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What is omitNorms

2011-05-25 Thread Chandan Tamrakar
This is an advance option. pls see the details on following link

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr#d0e71



On Thu, May 26, 2011 at 11:12 AM, Romi romijain3...@gmail.com wrote:

 and i also wanted to know  what is difference if i set omitNorms in
 fieldType
 or if i set it in field.

 -
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2987562.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Chandan Tamrakar
*
*


Re: problem in setting field attribute in schema.xml

2011-05-25 Thread Romi
even though i am running command for full-import, then also i deleted old
indexes , re created indexes, i am not using defaultSearchFiled and
copyingField attribute, still i am getting the search result for the field
which i set as indexed=true, really strange, please help me to get rid of
this problem. Thanks. 

-
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-setting-field-attribute-in-schema-xml-tp2984126p2987628.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query on facet field¹s count

2011-05-25 Thread rajini maski
Sorry for the late reply to this thread.

I implemented the same patch (solr 2242 )in Solr 1.4.1. Now I am able to
get distinct facet terms count across single index. But this does not work
for distributed process(sharding)..Is there a recent patch that has same
functionality for distributed process?


It works for the below query:

http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDfacet.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1


It doesn't work for :
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDfacet.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1
shards=localhost:8090/solr2

It gets matched result set from both the cores but facet results are only
from first core.

Rajani


On Sat, Mar 12, 2011 at 10:35 AM, rajini maski rajinima...@gmail.comwrote:

 Thanks Bill Bell . .This query works after applying the patch you refered
 to, is it? Please can you let me know how do I need to update the current
 war (apache solr 1.4.1 )file with this new patch? Thanks a lot.

 Thanks,
 Rajani

 On Sat, Mar 12, 2011 at 8:56 AM, Bill Bell billnb...@gmail.com wrote:


 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDface
 t.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1http://localhost:8983/solr/select?q=*:*facet=truefacet.field=StudyIDfacet.mincount=1facet.limit=-1f.StudyID.facet.namedistinct=1

 Would do what you want I believe...



 On 3/11/11 8:51 AM, Bill Bell billnb...@gmail.com wrote:

 There is my patch to do that. SOLR-2242
 
 Bill Bell
 Sent from mobile
 
 
 On Mar 11, 2011, at 1:34 AM, rajini maski rajinima...@gmail.com wrote:
 
  Query on facet field results...
 
 
When I run a facet query on some field say : facet=on 
  facet.field=StudyID I get list of distinct StudyID list with the count
 that
  tells that how many times did this study occur in the search query.
 But I
  also needed the count of these distinct StudyID list.. Any solr query
 to get
  count of it..
 
 
 
  Example:
 
 
 
lst name=*facet_fields*
 
 lst name= StudyID 
 
   int name=*105*135164/int
 
   int name=*179*79820/int
 
   int name=*107*70815/int
 
   int name=*120*37076/int
 
   int name=*134*35276/int
 
   /lst
 
  /lst
 
 
 
  I wanted the count attribute that shall return the count of number of
  different studyID occurred .. In above example  it could be  : Count =
 5
  (105,179,107,120,134)
 
 
 
  lst name=*facet_fields*
 
  lst name= StudyID   COUNT=5 
 
   int name=*105*135164/int
 
   int name=*179*79820/int
 
   int name=*107*70815/int
 
   int name=*120*37076/int
 
   int name=*134*35276/int
 
   /lst
 
  /lst