Re: disable caches in real time

2010-05-19 Thread Marco Martinez
Hi Chris,

Thank you for your answer.

I've always undestand that if you do a commit (replication does it), a new
searcher is open, and you lose performance (queries per second) while the
caches are regenerated. I think i don't explain correctly my situation
before, with my schema i want to avoid this loss of performance in an
enviroment with frequent updates.

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/18 Chris Hostetter hossman_luc...@fucit.org

 : I want to know if there is any approach to disable caches in a specific
 core
 : from a multicore server.

 only via hte config.

 : I have a multicore server where the core0 will be listen to the queries
 and
 : other core (core1) that will be replicated from a master server. Once the
 : replication has been done, i will swap the cores. My point is that i want
 to
 : disable the caches in the core that is in charge of the replication to
 save
 : memory in the machine.

 that seems bizarely complicated -- replication can work against a live
 core, no need to do the swap yourself, the replicationHandler takes care
 of this for your transparently (ie: you have one core, replicating from a
 master -- the old index will be searched by users, and have caches, and
 when the new version of the index is ready, the replication handler will
 swap the *index* in that core (but the core itself never changes) ... it
 can even autowarm the caches on the new index for you before the swap if
 you configure it that way.

 -Hoss




jmx issue with solr

2010-05-19 Thread Na_D

Hi,

I am trying to start solr with the following command :

java -Dsolr.solr.home=./example-DIH/solr/ -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=3000


On doing so an error is reported :

Error: Password file read access must be restricted: C:\Program
Files\Java\jdk1.
6.0_18\jre\lib\management\jmxremote.password


The jmxremote.password file is there in the lib\management folder and the
same has been set to read-only.
still the error persists.I am using Windows XP SP3 Version 2002, just
mentioning the same if its of any help.
Please do put in your suggestions.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/jmx-issue-with-solr-tp828478p828478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing RandomSortField

2010-05-19 Thread Marco Martinez
Hi Alexandre,

I am not totally sure about this, but the random sort field its only used to
do a random sort on your searchs, and you will to pass differents values to
have differents sorts, so this only applies in the searchs, so no value is
indexed. You will find more information here:
http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/18 Alexandre Rocco alel...@gmail.com

 Hi guys,

 Is there any way to mak a RandomSortField be stored?
 I'm trying to do it for debugging purposes,
 My intention is to take a look at the values that are stored there to
 determine the sorting that is being applied to the results.

 I tried to make it a stored field as:
 field name=randomorder type=random stored=true /

 And also tried to create another text field, copying the result from the
 random field like this:
 field name=randomorderdebug type=text indexed=true stored=true/
 copyField source=randomorder dest=randomorderdebug/

 Neither of the approaches worked.
 Is there any restriction on this kind of field that prevents it from being
 displayed in the results?

 Thanks,
 Alexandre



Re: Storing RandomSortField

2010-05-19 Thread Leonardo Menezes
Hey,
   for random sorting, random values are generated in runtime using the seed
you passed as one of the parameters to generate the value, among other
things. this way, if the value you use as seed is the same in different
request, the sorting order should be the same. you could also, for debbuing
purposes, edit the random sort field class and put some traces in there, so
it could print the id of the document and the value generated for example.
but the values wont be stored on the idx.

cheers

On Wed, May 19, 2010 at 10:00 AM, Marco Martinez 
mmarti...@paradigmatecnologico.com wrote:

 Hi Alexandre,

 I am not totally sure about this, but the random sort field its only used
 to
 do a random sort on your searchs, and you will to pass differents values to
 have differents sorts, so this only applies in the searchs, so no value is
 indexed. You will find more information here:

 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42


 2010/5/18 Alexandre Rocco alel...@gmail.com

  Hi guys,
 
  Is there any way to mak a RandomSortField be stored?
  I'm trying to do it for debugging purposes,
  My intention is to take a look at the values that are stored there to
  determine the sorting that is being applied to the results.
 
  I tried to make it a stored field as:
  field name=randomorder type=random stored=true /
 
  And also tried to create another text field, copying the result from the
  random field like this:
  field name=randomorderdebug type=text indexed=true stored=true/
  copyField source=randomorder dest=randomorderdebug/
 
  Neither of the approaches worked.
  Is there any restriction on this kind of field that prevents it from
 being
  displayed in the results?
 
  Thanks,
  Alexandre
 



Re: Solr Architecture discussion

2010-05-19 Thread rabahb

Do you have any insights that could help me and other people that might be
interested in that discussion?
Thanks.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p828658.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom sorting

2010-05-19 Thread dan sutton
Hi,

I have a requirement to do the following:

For up to the first 10 results (i.e. only on the first page) show
sponsored category ads, in order of bid, but no more than 2 / category,
and only if all sponsored cat' ads are more that min% of the highest
score. e.g. If I had the following:

min% =1


doc score bid  cat_id sponsored
  1   100   x   x 0
  255x   x 0

  3502   2 1
  4202   2 1
  5052   2 1

  6801   1 1
  7701   1 1
  8601   1 1

x = dont care

sorted order would be:

3
4

6
7

1
8
2
5

I'm not sure if this can be implemented with a custom comparator as I
need access to the final score to enforce min%, I'm thinking I'm
probably going to have to implement a subclass of QParserPlugin with a
custom sort. but was wondering if there were alternatives ?

Many thanks in advance.
Dan


Re: TikaEntityProcessor on Solr 1.4?

2010-05-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess it should work because Tika Entityprocessor does not use any
new 1.4 APIs

On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote:
 Sorry to repeat this question, but I realized that it probably
 belonged in its own thread:

 The TikaEntityProcessor class that enables DataImportHandler to
 process business documents was added after the release of Solr 1.4,
 along with some other changes (like the binary DataSources) to support
 it. Obviously, there hasn't been an official release of Solr since
 then. Has anyone tried back-porting those changes to Solr 1.4?

 (I do see that the question was asked last month, without any
 response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9)

 The patches for these issues don't seem all that complex or pervasive,
 but it's hard for me (as a Solr n00b) to tell whether this is really
 all that's involved:
 https://issues.apache.org/jira/browse/SOLR-1583
 https://issues.apache.org/jira/browse/SOLR-1358

 Sixten




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: Moving from Lucene to Solr?

2010-05-19 Thread findbestopensource
Hi Peter,

You need to use Lucene,

   - To have more control
   - You cannot depend on any Web server
   - To use termvector, termdocs etc
   - You could easily extend to have your own Analyzer

You need to use Solr,

   - To index and search docs easily by writting few code
   - Solr is a standalone App and it takes care most of the stuff like
   optimizing,warmup the reader etc..
   - Solr could be extended to multiple nodes
   - To use facet

If you are developing your client in Java and want to use Solr then i would
advise to use SolrJ as it is easy and you don't need to care about HTTP
stuff. I use Solr using SolrJ in my project www.findbestopensource.com

Regards
Aditya
www.findbestopensource.com



On Wed, May 19, 2010 at 4:08 PM, Peter Karich peat...@yahoo.de wrote:

 Hi all,

 while asking a question on stackoverflow [1] some other questions appear:
 Is SolrJ a recommended way to access Solr or should I prefer the HTTP
 interface?

 How can I (j)unit-test Solr? (e.g. create+delete index via Java call)

 Is Lucene faster than Solr? ... do you have experiences, preferable with
 the same index?

 The background is an application which uses Lucene at the moment but I
 hardly need the facetting feature of Solr and I don't want to implement
 it in Lucene for myself.

 Regards,
 Peter.

 [1]

 http://stackoverflow.com/questions/2856427/situations-to-prefer-apache-lucene-over-solr




Re: Deduplication

2010-05-19 Thread Ahmet Arslan

 Basically for some uses cases I would like to show
 duplicates for other I
 wanted them ignored.
 
 If I have overwriteDupes=false and I just create the dedup
 hash how can I
 query for only unique hash values... ie something like a
 SQL group by. 

TermsComponent maybe? 

or faceting? 
q=*:*facet=truefacet.field=signatureFielddefType=lucenerows=0start=0

if you append facet.mincount=1 to above url you can see your duplications


  


Re: Deduplication

2010-05-19 Thread Ahmet Arslan
 TermsComponent maybe? 
 
 or faceting?
 q=*:*facet=truefacet.field=signatureFielddefType=lucenerows=0start=0
 
 if you append facet.mincount=1 to above url you can
 see your duplications
 

After re-reading your message: sometimes you want to show duplicates, sometimes 
you don't want them. I have never used FieldCollapsing by myself but heard 
about it many times.

http://wiki.apache.org/solr/FieldCollapsing


  


Re: jmx issue with solr

2010-05-19 Thread Jean-Sebastien Vachon
Hi,

Try adding these options...

-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false


On 2010-05-19, at 3:44 AM, Na_D wrote:

 
 Hi,
 
 I am trying to start solr with the following command :
 
 java -Dsolr.solr.home=./example-DIH/solr/ -Dcom.sun.management.jmxremote
 -Dcom.sun.management.jmxremote.port=3000
 
 
 On doing so an error is reported :
 
 Error: Password file read access must be restricted: C:\Program
 Files\Java\jdk1.
 6.0_18\jre\lib\management\jmxremote.password
 
 
 The jmxremote.password file is there in the lib\management folder and the
 same has been set to read-only.
 still the error persists.I am using Windows XP SP3 Version 2002, just
 mentioning the same if its of any help.
 Please do put in your suggestions.
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/jmx-issue-with-solr-tp828478p828478.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Storing RandomSortField

2010-05-19 Thread Alexandre Rocco
Leonardo,

I was able to use the feature with a dynamic field as pointed in the
documentation.
So, I was just curious to take a peek at the values that are generated, even
when the field is not dynamic, so I tried to figure out a way to do so.
Maybe some output when the debug query is enabled would be useful, but it
seems it's not implemented yet.
I will try to take a look at the classes and see what can I do about it.

Thanks!

On Wed, May 19, 2010 at 5:34 AM, Leonardo Menezes 
leonardo.menez...@googlemail.com wrote:

 Hey,
   for random sorting, random values are generated in runtime using the seed
 you passed as one of the parameters to generate the value, among other
 things. this way, if the value you use as seed is the same in different
 request, the sorting order should be the same. you could also, for debbuing
 purposes, edit the random sort field class and put some traces in there, so
 it could print the id of the document and the value generated for example.
 but the values wont be stored on the idx.

 cheers

 On Wed, May 19, 2010 at 10:00 AM, Marco Martinez 
 mmarti...@paradigmatecnologico.com wrote:

  Hi Alexandre,
 
  I am not totally sure about this, but the random sort field its only used
  to
  do a random sort on your searchs, and you will to pass differents values
 to
  have differents sorts, so this only applies in the searchs, so no value
 is
  indexed. You will find more information here:
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
 
  Marco Martínez Bautista
  http://www.paradigmatecnologico.com
  Avenida de Europa, 26. Ática 5. 3ª Planta
  28224 Pozuelo de Alarcón
  Tel.: 91 352 59 42
 
 
  2010/5/18 Alexandre Rocco alel...@gmail.com
 
   Hi guys,
  
   Is there any way to mak a RandomSortField be stored?
   I'm trying to do it for debugging purposes,
   My intention is to take a look at the values that are stored there to
   determine the sorting that is being applied to the results.
  
   I tried to make it a stored field as:
   field name=randomorder type=random stored=true /
  
   And also tried to create another text field, copying the result from
 the
   random field like this:
   field name=randomorderdebug type=text indexed=true
 stored=true/
   copyField source=randomorder dest=randomorderdebug/
  
   Neither of the approaches worked.
   Is there any restriction on this kind of field that prevents it from
  being
   displayed in the results?
  
   Thanks,
   Alexandre
  
 



Re: jmx issue with solr

2010-05-19 Thread Na_D

Thanks for the info , using the above properties solved the issue .
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/jmx-issue-with-solr-tp828478p829057.html
Sent from the Solr - User mailing list archive at Nabble.com.


defaultSearchField

2010-05-19 Thread Antonello Mangone
Hi to everyone, I'd like to know if it's possible to use the *
defaultSearchField* on more fields ???

i.e.

defaultSearchField field1, field2, field3 /defaultSearchField


Thanks you all


Re: defaultSearchField

2010-05-19 Thread Ahmet Arslan
 Hi to everyone, I'd like to know if
 it's possible to use the *
 defaultSearchField* on more fields ???
 
 i.e.
 
 defaultSearchField field1, field2, field3
 /defaultSearchField
 

No. But you can query multiple fields using dismax. 

qf=field1,field2,field3defType=dismax

http://wiki.apache.org/solr/DisMaxRequestHandler


  


Re: defaultSearchField

2010-05-19 Thread Jan Kammer
There is something called dismax-requesthandler. I think this is what 
you are looking for.


greetz, Jan


Am 19.05.2010 15:47, schrieb Antonello Mangone:

Hi to everyone, I'd like to know if it's possible to use the *
defaultSearchField* on more fields ???

i.e.

defaultSearchField  field1, field2, field3/defaultSearchField


Thanks you all

   




Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread hkmortensen

I do searching for products. Each base product exist in variants as well. One
variant has a glass door, another a steel door etc. The variants can have
diffent prices. The base product does not really exist, only the variants
exists IRL. The case corresponds to cars: the car model is the base product,
with color variants  or with automatic/manual etc.

I want to search for variants, but I only want to have base products in the
result. Ie when one or more variants from the same base product are found,
only the base product shall be in the search result.

Does somebody have an idea how this could be done?

Best regards

Henning
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH. behavior after a import. Log, delete table !?

2010-05-19 Thread stockii

hey, thx

i did all what you say.

createn an Jar-file. this jar file delete my table.

but SOLR absolute dont want to start this JAR. i put a run.bat file into my
folder where is my jar saved. this batch-file runs and delete the table, but
when solr start this batch-file. it doesnt work. i dont know why. !?!?!?
i test the batch-file in different wayy and it should be work... help ^^

windows xp for test ;-)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p829230.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Leonardo Menezes
would you then need to know in which variant was your match produced?
because if not, you can just index the whole thing as one single document...

On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:


 I do searching for products. Each base product exist in variants as well.
 One
 variant has a glass door, another a steel door etc. The variants can have
 diffent prices. The base product does not really exist, only the variants
 exists IRL. The case corresponds to cars: the car model is the base
 product,
 with color variants  or with automatic/manual etc.

 I want to search for variants, but I only want to have base products in the
 result. Ie when one or more variants from the same base product are found,
 only the base product shall be in the search result.

 Does somebody have an idea how this could be done?

 Best regards

 Henning
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Embedded Server, Caching, Stats page updates

2010-05-19 Thread Antoniya Statelova

 The way you phrased that paragraph makes me think that one of us doesn't
 understand what exactly you did when you switched ...


Switched works for the specific setup i'm using - the server would refer
to itself in the CommonHttpSolrServer request sent, i.e. it would run both
the server and client sides. Removing this and simply using
EmbeddedSolrServer just made the setup a little more sane in that aspect.
Does that make more sense now?


 Now for starters: if the remote server you were running solr on is more
 powerful then the local machine you are running your java application on,
 that alone could explain some performance differences (likewise for JVM
 settings).

The machine I'm running it on is exactly the same - the code change was
pushed and I had performance before and after. Same load observed (since
it's a testing machine i could regulate that). That's why i was so surprised
that removing that additional http request didn't cause improvement.


 Most importantly: when running solr embedded in your application, there is
 no stats.jsp page for you to look at -- because solr is no longer
 running in a servlet container.  so if you are seeing stats on your
 solr server that say your caches aren't being hit, the reason is because
 the server isn't being hit at all.


This is nice to know, I didn't look into how the actual page was generated.
I expected something like this to be true. Thank you!


 When running an embedded solr server, the filterCache and queryResultCache
 will still be used.  the settings in the solrconfig.xml you specify when
 initializing the SolrCore will be honored.  you can see use JMX to monitor
 those cache hit rates (assuming you have JMX enabled for your application,
 and the appropriate setting is in your solrconfig.xml)

 I'll look into using JMX, thanks for the suggestion.

Tony


Re: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread hkmortensen

thanks. Currently not, but requirements change all the time as always ;-) 
If we get a requirement, that a facet shall be material of doors, we will
need to know which variant was the hit. I would like to be prepared for
that.




Leonardo Menezes wrote:
 
 would you then need to know in which variant was your match produced?
 because if not, you can just index the whole thing as one single
 document...
 
 On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 

 I do searching for products. Each base product exist in variants as well.
 One
 variant has a glass door, another a steel door etc. The variants can have
 diffent prices. The base product does not really exist, only the variants
 exists IRL. The case corresponds to cars: the car model is the base
 product,
 with color variants  or with automatic/manual etc.

 I want to search for variants, but I only want to have base products in
 the
 result. Ie when one or more variants from the same base product are
 found,
 only the base product shall be in the search result.

 Does somebody have an idea how this could be done?

 Best regards

 Henning
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Leonardo Menezes
if that is so, and maybe, you have for example, two variants of cars with
automatic, what would define on which one was the hit? or field dont share
common information across variants? if they do share, you wouldnt be able to
define in which one was the hit(because it was on both of them) and would
either have to pick one randomly, or retrieve both. if they dont share that
info, you would have that covered, since only one would match any given
query.

On Wed, May 19, 2010 at 5:04 PM, hkmortensen ko...@yahoo.com wrote:


 thanks. Currently not, but requirements change all the time as always ;-)
 If we get a requirement, that a facet shall be material of doors, we will
 need to know which variant was the hit. I would like to be prepared for
 that.




 Leonardo Menezes wrote:
 
  would you then need to know in which variant was your match produced?
  because if not, you can just index the whole thing as one single
  document...
 
  On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 
 
  I do searching for products. Each base product exist in variants as
 well.
  One
  variant has a glass door, another a steel door etc. The variants can
 have
  diffent prices. The base product does not really exist, only the
 variants
  exists IRL. The case corresponds to cars: the car model is the base
  product,
  with color variants  or with automatic/manual etc.
 
  I want to search for variants, but I only want to have base products in
  the
  result. Ie when one or more variants from the same base product are
  found,
  only the base product shall be in the search result.
 
  Does somebody have an idea how this could be done?
 
  Best regards
 
  Henning
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Nagelberg, Kallin
I agree that pulling all attributes into the parent sku during indexing could 
work well. Define a Boolean field like 'isVirtual' to identify the non-leaf 
skus, and use a multi-valued field for each of the attributes. For now you can 
do a search like (isVirtual:true AND doorType:screen). If at a later date you 
want the actual variants just search for isVirtual:false.

Does that work?

-Kallin Nagelberg

-Original Message-
From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] 
Sent: Wednesday, May 19, 2010 11:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Challenge: Searching for variant products and get basic products 
in result set

if that is so, and maybe, you have for example, two variants of cars with
automatic, what would define on which one was the hit? or field dont share
common information across variants? if they do share, you wouldnt be able to
define in which one was the hit(because it was on both of them) and would
either have to pick one randomly, or retrieve both. if they dont share that
info, you would have that covered, since only one would match any given
query.

On Wed, May 19, 2010 at 5:04 PM, hkmortensen ko...@yahoo.com wrote:


 thanks. Currently not, but requirements change all the time as always ;-)
 If we get a requirement, that a facet shall be material of doors, we will
 need to know which variant was the hit. I would like to be prepared for
 that.




 Leonardo Menezes wrote:
 
  would you then need to know in which variant was your match produced?
  because if not, you can just index the whole thing as one single
  document...
 
  On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 
 
  I do searching for products. Each base product exist in variants as
 well.
  One
  variant has a glass door, another a steel door etc. The variants can
 have
  diffent prices. The base product does not really exist, only the
 variants
  exists IRL. The case corresponds to cars: the car model is the base
  product,
  with color variants  or with automatic/manual etc.
 
  I want to search for variants, but I only want to have base products in
  the
  result. Ie when one or more variants from the same base product are
  found,
  only the base product shall be in the search result.
 
  Does somebody have an idea how this could be done?
 
  Best regards
 
  Henning
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread hkmortensen

your are right, in that case an arbitrary on would have to be chosen or
probably then both should be in the result set. Difficult to say what the
marketing department would like ;-)



Leonardo Menezes wrote:
 
 if that is so, and maybe, you have for example, two variants of cars with
 automatic, what would define on which one was the hit? or field dont share
 common information across variants? if they do share, you wouldnt be able
 to
 define in which one was the hit(because it was on both of them) and would
 either have to pick one randomly, or retrieve both. if they dont share
 that
 info, you would have that covered, since only one would match any given
 query.
 
 On Wed, May 19, 2010 at 5:04 PM, hkmortensen ko...@yahoo.com wrote:
 

 thanks. Currently not, but requirements change all the time as always ;-)
 If we get a requirement, that a facet shall be material of doors, we
 will
 need to know which variant was the hit. I would like to be prepared for
 that.




 Leonardo Menezes wrote:
 
  would you then need to know in which variant was your match produced?
  because if not, you can just index the whole thing as one single
  document...
 
  On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 
 
  I do searching for products. Each base product exist in variants as
 well.
  One
  variant has a glass door, another a steel door etc. The variants can
 have
  diffent prices. The base product does not really exist, only the
 variants
  exists IRL. The case corresponds to cars: the car model is the base
  product,
  with color variants  or with automatic/manual etc.
 
  I want to search for variants, but I only want to have base products
 in
  the
  result. Ie when one or more variants from the same base product are
  found,
  only the base product shall be in the search result.
 
  Does somebody have an idea how this could be done?
 
  Best regards
 
  Henning
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829413.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread hkmortensen

sorry, what does sku mean?

I understand you like this: indexing base and variants, and include all
atributes (for one base and its variants) in each document. I think that
would work. Thanks.


Nagelberg, Kallin wrote:
 
 I agree that pulling all attributes into the parent sku during indexing
 could work well. Define a Boolean field like 'isVirtual' to identify the
 non-leaf skus, and use a multi-valued field for each of the attributes.
 For now you can do a search like (isVirtual:true AND doorType:screen). If
 at a later date you want the actual variants just search for
 isVirtual:false.
 
 Does that work?
 
 -Kallin Nagelberg
 
 -Original Message-
 From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] 
 Sent: Wednesday, May 19, 2010 11:13 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Challenge: Searching for variant products and get basic
 products in result set
 
 if that is so, and maybe, you have for example, two variants of cars with
 automatic, what would define on which one was the hit? or field dont share
 common information across variants? if they do share, you wouldnt be able
 to
 define in which one was the hit(because it was on both of them) and would
 either have to pick one randomly, or retrieve both. if they dont share
 that
 info, you would have that covered, since only one would match any given
 query.
 
 On Wed, May 19, 2010 at 5:04 PM, hkmortensen ko...@yahoo.com wrote:
 

 thanks. Currently not, but requirements change all the time as always ;-)
 If we get a requirement, that a facet shall be material of doors, we
 will
 need to know which variant was the hit. I would like to be prepared for
 that.




 Leonardo Menezes wrote:
 
  would you then need to know in which variant was your match produced?
  because if not, you can just index the whole thing as one single
  document...
 
  On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 
 
  I do searching for products. Each base product exist in variants as
 well.
  One
  variant has a glass door, another a steel door etc. The variants can
 have
  diffent prices. The base product does not really exist, only the
 variants
  exists IRL. The case corresponds to cars: the car model is the base
  product,
  with color variants  or with automatic/manual etc.
 
  I want to search for variants, but I only want to have base products
 in
  the
  result. Ie when one or more variants from the same base product are
  found,
  only the base product shall be in the search result.
 
  Does somebody have an idea how this could be done?
 
  Best regards
 
  Henning
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829435.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Nagelberg, Kallin
Sorry, in North America 'sku' (stock keeping unit) is the common term in 
business to specifically identify a particular product, 
http://lmgtfy.com/?q=sku. 

And yes, I think you understand me. I am imagining you can structure your 
products in a hierarchy. For each node in the tree you traverse all children, 
collecting their attributes into the current node.

-Kallin Nagelberg

-Original Message-
From: hkmortensen [mailto:ko...@yahoo.com] 
Sent: Wednesday, May 19, 2010 11:39 AM
To: solr-user@lucene.apache.org
Subject: RE: Challenge: Searching for variant products and get basic products 
in result set


sorry, what does sku mean?

I understand you like this: indexing base and variants, and include all
atributes (for one base and its variants) in each document. I think that
would work. Thanks.


Nagelberg, Kallin wrote:
 
 I agree that pulling all attributes into the parent sku during indexing
 could work well. Define a Boolean field like 'isVirtual' to identify the
 non-leaf skus, and use a multi-valued field for each of the attributes.
 For now you can do a search like (isVirtual:true AND doorType:screen). If
 at a later date you want the actual variants just search for
 isVirtual:false.
 
 Does that work?
 
 -Kallin Nagelberg
 
 -Original Message-
 From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] 
 Sent: Wednesday, May 19, 2010 11:13 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Challenge: Searching for variant products and get basic
 products in result set
 
 if that is so, and maybe, you have for example, two variants of cars with
 automatic, what would define on which one was the hit? or field dont share
 common information across variants? if they do share, you wouldnt be able
 to
 define in which one was the hit(because it was on both of them) and would
 either have to pick one randomly, or retrieve both. if they dont share
 that
 info, you would have that covered, since only one would match any given
 query.
 
 On Wed, May 19, 2010 at 5:04 PM, hkmortensen ko...@yahoo.com wrote:
 

 thanks. Currently not, but requirements change all the time as always ;-)
 If we get a requirement, that a facet shall be material of doors, we
 will
 need to know which variant was the hit. I would like to be prepared for
 that.




 Leonardo Menezes wrote:
 
  would you then need to know in which variant was your match produced?
  because if not, you can just index the whole thing as one single
  document...
 
  On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 
 
  I do searching for products. Each base product exist in variants as
 well.
  One
  variant has a glass door, another a steel door etc. The variants can
 have
  diffent prices. The base product does not really exist, only the
 variants
  exists IRL. The case corresponds to cars: the car model is the base
  product,
  with color variants  or with automatic/manual etc.
 
  I want to search for variants, but I only want to have base products
 in
  the
  result. Ie when one or more variants from the same base product are
  found,
  only the base product shall be in the search result.
 
  Does somebody have an idea how this could be done?
 
  Best regards
 
  Henning
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829435.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH. behavior after a import. Log, delete table !?

2010-05-19 Thread Ahmet Arslan
 createn an Jar-file. this jar file delete my table.
 
 but SOLR absolute dont want to start this JAR. i put a
 run.bat file into my
 folder where is my jar saved. this batch-file runs and
 delete the table, but
 when solr start this batch-file. it doesnt work. i dont
 know why. !?!?!?
 i test the batch-file in different wayy and it should be
 work... help ^^
 
 windows xp for test ;-)

I don't know why but, it seems that we need to set dir other than '.'
Anyway I got it working in Windows in two ways:

1-)
updateHandler class=solr.DirectUpdateHandler2
 listener event=postCommit class=solr.RunExecutableListener 
  str name=exejava/str 
  str name=dirsolr/bin/str 
  arr name=args str-jar/str strjunk.jar/str /arr
  bool name=waittrue/bool 
 /listener 
/updateHandler

2-) Giving full paths:

updateHandler class=solr.DirectUpdateHandler2
 listener event=postCommit class=solr.RunExecutableListener 
  str name=exeC:\test.bat/str 
  str name=dirC:\/str 
  bool name=waittrue/bool 
/listener 
/updateHandler

It should work this time on windows.


  


RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread hkmortensen

yes I think that will make a good solution. In Dänish sku is a bad word
;-), but thanks for the info.


Nagelberg, Kallin wrote:
 
 Sorry, in North America 'sku' (stock keeping unit) is the common term in
 business to specifically identify a particular product,
 http://lmgtfy.com/?q=sku. 
 
 And yes, I think you understand me. I am imagining you can structure your
 products in a hierarchy. For each node in the tree you traverse all
 children, collecting their attributes into the current node.
 
 -Kallin Nagelberg
 
 -Original Message-
 From: hkmortensen [mailto:ko...@yahoo.com] 
 Sent: Wednesday, May 19, 2010 11:39 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Challenge: Searching for variant products and get basic
 products in result set
 
 
 sorry, what does sku mean?
 
 I understand you like this: indexing base and variants, and include all
 atributes (for one base and its variants) in each document. I think that
 would work. Thanks.
 
 
 Nagelberg, Kallin wrote:
 
 I agree that pulling all attributes into the parent sku during indexing
 could work well. Define a Boolean field like 'isVirtual' to identify the
 non-leaf skus, and use a multi-valued field for each of the attributes.
 For now you can do a search like (isVirtual:true AND doorType:screen). If
 at a later date you want the actual variants just search for
 isVirtual:false.
 
 Does that work?
 
 -Kallin Nagelberg
 
 -Original Message-
 From: Leonardo Menezes [mailto:leonardo.menez...@googlemail.com] 
 Sent: Wednesday, May 19, 2010 11:13 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Challenge: Searching for variant products and get basic
 products in result set
 
 if that is so, and maybe, you have for example, two variants of cars with
 automatic, what would define on which one was the hit? or field dont
 share
 common information across variants? if they do share, you wouldnt be able
 to
 define in which one was the hit(because it was on both of them) and would
 either have to pick one randomly, or retrieve both. if they dont share
 that
 info, you would have that covered, since only one would match any given
 query.
 
 On Wed, May 19, 2010 at 5:04 PM, hkmortensen ko...@yahoo.com wrote:
 

 thanks. Currently not, but requirements change all the time as always
 ;-)
 If we get a requirement, that a facet shall be material of doors, we
 will
 need to know which variant was the hit. I would like to be prepared for
 that.




 Leonardo Menezes wrote:
 
  would you then need to know in which variant was your match produced?
  because if not, you can just index the whole thing as one single
  document...
 
  On Wed, May 19, 2010 at 4:23 PM, hkmortensen ko...@yahoo.com wrote:
 
 
  I do searching for products. Each base product exist in variants as
 well.
  One
  variant has a glass door, another a steel door etc. The variants can
 have
  diffent prices. The base product does not really exist, only the
 variants
  exists IRL. The case corresponds to cars: the car model is the base
  product,
  with color variants  or with automatic/manual etc.
 
  I want to search for variants, but I only want to have base products
 in
  the
  result. Ie when one or more variants from the same base product are
  found,
  only the base product shall be in the search result.
 
  Does somebody have an idea how this could be done?
 
  Best regards
 
  Henning
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829218.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829319.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 -- 
 View this message in context:
 http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829435.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Challenge-Searching-for-variant-products-and-get-basic-products-in-result-set-tp829218p829530.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Delta Queries

2010-05-19 Thread Vladimir Sutskever
I have a indexed_timestamp field  in my index - which lets me know when 
document was indexed:

field name=indexed_timestamp type=date indexed=true stored=true 
default=NOW multiValued=false/


For some reason when doing delta indexing via DIH, this field is not being 
updated.

Are timestamp fields updated during DELTA updates?



Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Re: index merge

2010-05-19 Thread uma m

Hi All,

  I am running solr in 64 bit HP-UX system. The total index size is about
5GB and when i try load any new document, solr tries to merge the existing
segments first and results in following error. I could see a temp file is
growng within index dir around 2GB in size and later it fails with this
exception. It looks like, by reaching Integer.MAXVALUE, the exception
occurs.

Exception in thread Lucene Merge Thread #0
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
File too large (errno:27)
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
Caused by: java.io.IOException: File too large (errno:27)
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:456)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
at
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
at
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
at
org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.close(SimpleFSDirectory.java:199)
at org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:144)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:357)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

---

The solrconfig.xml contains default values for indexDefaults, mainIndex
sections as below.

  indexDefaults^M
   !-- Values here affect all index writers and act as a default unless
overridden. --^M
useCompoundFilefalse/useCompoundFile^M
^M
mergeFactor10/mergeFactor^M
!-- If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene
will flush^M
 based on whichever limit is hit first.  --^M
!--maxBufferedDocs1000/maxBufferedDocs--^M
^M
!-- Sets the amount of RAM that may be used by Lucene indexing^M
  for buffering added documents and deletions before they are^M
  flushed to the Directory.  --^M
ramBufferSizeMB32/ramBufferSizeMB^M
!-- maxMergeDocs2147483647/maxMergeDocs --^M
maxFieldLength1/maxFieldLength^M
writeLockTimeout1000/writeLockTimeout^M
commitLockTimeout1/commitLockTimeout^M
 !--mergePolicy
class=org.apache.lucene.index.LogByteSizeMergePolicy/--^M
!--mergeScheduler
class=org.apache.lucene.index.ConcurrentMergeScheduler/--^M
  /indexDefaults^
 mainIndex^M
!-- options specific to the main on-disk lucene index --^M
useCompoundFilefalse/useCompoundFile^M
ramBufferSizeMB32/ramBufferSizeMB^M
mergeFactor10/mergeFactor^M
!-- Deprecated --^M
!--maxBufferedDocs1000/maxBufferedDocs--^M
!--maxMergeDocs2147483647/maxMergeDocs--^M
 /mainIndex^


Could anyone help me to resolve this exception?

Regards,
Uma
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/index-merge-tp472904p829810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: index merge

2010-05-19 Thread Ahmet Arslan
 I am running solr in 64 bit HP-UX system. The total
 index size is about
 5GB and when i try load any new document, solr tries to
 merge the existing
 segments first and results in following error. I could see
 a temp file is
 growng within index dir around 2GB in size and later it
 fails with this
 exception. It looks like, by reaching Integer.MAXVALUE, the
 exception
 occurs.

ramBufferSizeMB32/ramBufferSizeMB isn't 32MB ramBufferSizeMB too small?



  


The Seven Deadly Sins of Solr spanish translation

2010-05-19 Thread Juan Pedro Danculovic
Hello, I translate this article into Spanish. It is very helpful to avoid
common mistakes in solr installations.

http://www.linebee.com/?p=434lang=es

Thanks,

Juan


Re: Custom sorting

2010-05-19 Thread Daniel Cassiano
Hi Dan,

It seems that you want a SearchComponent[1], something like the
QueryElevationComponent[2].
Take a look how at him and I think you can build your custom solution.

[1]-
http://lucene.apache.org/solr/api/org/apache/solr/handler/component/SearchComponent.html
[2]- http://wiki.apache.org/solr/QueryElevationComponent


Cheers,

-- Daniel Cassiano

http://dcassiano.wordpress.com


On Wed, May 19, 2010 at 6:46 AM, dan sutton danbsut...@gmail.com wrote:

 Hi,

 I have a requirement to do the following:

 For up to the first 10 results (i.e. only on the first page) show
 sponsored category ads, in order of bid, but no more than 2 / category,
 and only if all sponsored cat' ads are more that min% of the highest
 score. e.g. If I had the following:

 min% =1


 doc score bid  cat_id sponsored
  1   100   x   x 0
  255x   x 0

  3502   2 1
  4202   2 1
  5052   2 1

  6801   1 1
  7701   1 1
  8601   1 1

 x = dont care

 sorted order would be:

 3
 4

 6
 7

 1
 8
 2
 5

 I'm not sure if this can be implemented with a custom comparator as I
 need access to the final score to enforce min%, I'm thinking I'm
 probably going to have to implement a subclass of QParserPlugin with a
 custom sort. but was wondering if there were alternatives ?

 Many thanks in advance.
 Dan



Re: disable caches in real time

2010-05-19 Thread Chris Hostetter

: I've always undestand that if you do a commit (replication does it), a new
: searcher is open, and you lose performance (queries per second) while the
: caches are regenerated. I think i don't explain correctly my situation

not if you configure your caches with autowarming -- then solr will warm 
up the new caches (on the new index) while the old index still serves 
requests -- this is all manged for you by the SolrCore, no need for core 
swapping.


-Hoss



RE: disable caches in real time

2010-05-19 Thread Nagelberg, Kallin
I suppose you are still losing some performance on the replicated box since it 
needs to use some resources to warm the cache. It would be nice if a warmed 
cache could be replicated from the master though perhaps that's not practical. 
Chris is right though: The newly updated index created by a commit is not seen 
by users until it has been warmed, at which point it is atomically swapped.

-Kallin Nagelberg



-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, May 19, 2010 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: disable caches in real time


: I've always undestand that if you do a commit (replication does it), a new
: searcher is open, and you lose performance (queries per second) while the
: caches are regenerated. I think i don't explain correctly my situation

not if you configure your caches with autowarming -- then solr will warm 
up the new caches (on the new index) while the old index still serves 
requests -- this is all manged for you by the SolrCore, no need for core 
swapping.


-Hoss



Stemming Filters in wiki

2010-05-19 Thread Asif Rahman
I see that the entries for PorterStemFilterFactory,
EnglishPorterFilterFactory, and SnowballPorterFilterFactory have been
removed from the Analyzers, Tokenizers, and Token Filters wiki page.  Is
there a reason for this?

Thanks,

asif


-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com


Re: Embedded Server, Caching, Stats page updates

2010-05-19 Thread Chris Hostetter

: Switched works for the specific setup i'm using - the server would refer
: to itself in the CommonHttpSolrServer request sent, i.e. it would run both
: the server and client sides. Removing this and simply using
: EmbeddedSolrServer just made the setup a little more sane in that aspect.
: Does that make more sense now?

not really ... what *exactly* did you change about your setup and 
your client code?  please be specific -- how did you run solr
before when you were using CommonsHttpSolrServer? whare are *all* of the 
steps you did when you switched to EmbeddedSolrServer (specificly: what 
did the changes to your java client code look like, and what did you 
hcange about how you run solr)

Because if you still have the solr.war running in your servlet container, 
and all you did is edit your java code to use EmbeddedSolrServer (poiting 
at the same directory on disk) instead of COmmonsHttpSolrServer, thne you 
are now running *two* instances of Solr in your VM, both reading from the 
same indexes.


-Hoss



Re: Stemming Filters in wiki

2010-05-19 Thread Robert Muir
Hi Asif,

These entries were moved here: http://wiki.apache.org/solr/LanguageAnalysis

On Wed, May 19, 2010 at 2:49 PM, Asif Rahman a...@newscred.com wrote:
 I see that the entries for PorterStemFilterFactory,
 EnglishPorterFilterFactory, and SnowballPorterFilterFactory have been
 removed from the Analyzers, Tokenizers, and Token Filters wiki page.  Is
 there a reason for this?

 Thanks,

 asif


 --
 Asif Rahman
 Lead Engineer - NewsCred
 a...@newscred.com
 http://platform.newscred.com




-- 
Robert Muir
rcm...@gmail.com


Re: Moving from Lucene to Solr?

2010-05-19 Thread Chris Hostetter

: Subject: Moving from Lucene to Solr?
: References: aanlktimxy1wscs_bjzkkkdy7dlrw1iober5kzszrf...@mail.gmail.com
: In-Reply-To: aanlktimxy1wscs_bjzkkkdy7dlrw1iober5kzszrf...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss



Re: Stemming Filters in wiki

2010-05-19 Thread Chris Hostetter
: 
: These entries were moved here: http://wiki.apache.org/solr/LanguageAnalysis

but there doesn't seem to be a link to that page from 
AnalyzersTokenizersTokenFilters (or from anywhere on the wiki according to 
the wiki link search feature) ... so i'll add some verbage about it.

: 
: On Wed, May 19, 2010 at 2:49 PM, Asif Rahman a...@newscred.com wrote:
:  I see that the entries for PorterStemFilterFactory,
:  EnglishPorterFilterFactory, and SnowballPorterFilterFactory have been
:  removed from the Analyzers, Tokenizers, and Token Filters wiki page.  Is
:  there a reason for this?
: 
:  Thanks,
: 
:  asif
: 
: 
:  --
:  Asif Rahman
:  Lead Engineer - NewsCred
:  a...@newscred.com
:  http://platform.newscred.com
: 
: 
: 
: 
: -- 
: Robert Muir
: rcm...@gmail.com
: 



-Hoss


Query Timings increase after system is idle

2010-05-19 Thread ST ST
Folks,

We have a problem in our environment where after a system is idle the query
time goes up from a few 100ms to 4+ seconds after 9 hours of idle time on
the system.

System Details:
 - Solr 1.4
 - 10 Million Index.
 - Use MMAP for mapping the index files in memory

Test Details:
-  8 hour performance run with ingestion (@ 8 docs/sec) , query rate - 3
Queries per sec.
-  Commit is per hour.

Issue:
- After 9 hours of idle time (ie no queries, no ingestion ) every query
takes 4+ seconds, subsequent queries are fast.

I have a few specific questions:
A. Does Lucene/Solr have internal caches which may be flushed out of memory
when the system is idle ?
B. What operations are done on a per term basis (example: build doc lists )
for first time queries.
C. Any pointers to what else may be an issue here.

Really appreciate any help you can provide.

ST


Re: Moving from Lucene to Solr?

2010-05-19 Thread Peter Karich
Sorry. Wasn't intended as a hijacking :-(


: Subject: Moving from Lucene to Solr?
: References: aanlktimxy1wscs_bjzkkkdy7dlrw1iober5kzszrf...@mail.gmail.com
: In-Reply-To: aanlktimxy1wscs_bjzkkkdy7dlrw1iober5kzszrf...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss



caching on unique queries

2010-05-19 Thread Kevin Osborn
Pretty much every one of my queries is going to be unique. However, the query 
is fairly complex and also contains both unique and non-unique data. In the 
query, some fields will be unique (e.g description), but other fields will be 
fairly common (e.g. category). If we could use those common fields as filters, 
it would be easy to use the filter cache. I could just separate the filters and 
let the filter cache do its thing. Unfortunately, due to the nature of our 
application, pretty much every field is just a boost.

So, right now, I am getting absolutely no use out of the cache. The only cache 
that might be useful is the Document Cache. Even then I am not sure.

Is there anyway to cache part of the query? Or basically cache subqueries? I 
have my own request handler, so I am willing to write the necessary code. I am 
fearful that the best performance may be to just turn off caching.


  

Subclassing DIH

2010-05-19 Thread Blargy

I am trying to subclass DIH to add I am having a hard time trying to get
access to the current Solr Context. How is this possible? 

Is there anyway to get access to the current DataSource, DataImporter etc?

On a related note... when working with an onImportEnd, or onImportStart how
can I get a reference to the current Request/Response that initiated the
import? 

From the DIH subclass I can access the request/response but not the context.
From the event listener I can access the Context but not the
request/response. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p830954.html
Sent from the Solr - User mailing list archive at Nabble.com.